“Will that break a query folding?” “Does your query fold?”… Maybe someone asked you those questions, but you were like: “Query…Whaaaat?!”
Or, maybe you’ve heard about query folding in Power BI, but didn’t know how to take advantage of it in real-life scenarios.
If you recognized yourself in (at least) one of the two situations specified above, then please continue reading this short series of blog posts!
- Part 1 – What is a Query folding in Power BI and why should I care?
- Part 2 – Query folding – Devil is in the detail!
I assume you are now familiar with the concept of query folding in Power BI, and especially with its importance for data refresh and incremental refresh processes. We’ve also started to scratch some interesting behaviors of Power Query transformations, and in this final part of the series, I will show you a few more interesting findings.
Finally, we will wrap it up with the ultimate performance test – I will show you the exact numbers behind two identical queries – one folds, and the other not!
Changing Data types
One of the most common transformations in Power Query is changing data type. It’s a well-known best practice to use proper data types in your data model – for example, if you don’t need hours, minutes, and seconds level of granularity in your reports, you should be better of getting rid of them and change the data type of that column from Date/Time to Date only.
However, the road to hell is paved with good intentions:)…So, let me show you one subtle difference that can cause your query to become damn slow, even though you’ve stick with the recommendation to use proper data type!
As you can spot in the illustration above, my OrderDate column is of Date/Time data type. And, I want to switch it to Date only. There are (at least) two possible options to do this – the first one is to right-click on the column, expand drop-down for Change Type option (like I did in the illustration) and select Date type (just below the Date/Time):
A few important things happened here, so let me explain each of those:
- In the Applied Steps pane, you can notice that our transformation step had been recorded
- In the column itself, you can see that the time portion disappeared
- When I’ve opened the View Native Query dialog box, you can see that the Mashup engine nicely translated our transformation to a T-SQL CONVERT() function
- The M formula applied to this transformation step is: Table.TransformColumnTypes()
Let’s now examine the other option to change data type of our column:
Just below our previous Change Type option, there is a Transform option. Once you expand the drop-down, you can see the Date Only transformation. Let’s click on it and check what happens:
Looks quite similar, does it? But, let’s again walk through all the things that happened now:
- Instead of the Changed Type step, we now have a step called Extracted Date
- The column itself looks exactly the same as in the previous example – no time part in there
- Ooops, the query doesn’t fold anymore! As you can see, the View Native Query option is greyed out!
- This time, M formula applied is: Table.TransformColumns()
So, one single different word in the M formula (Table.TransformColumnTypes vs Table.TransformColumns) affected our query so hard that it couldn’t be translated to SQL!
Takeover from this story: be careful, and watch out when you’re choosing options for changing data types!
Writing Native SQL query to get data
One of the earliest decisions you need to make when importing the data into Power BI is: do you want to import the data from the SQL database “as-it-is”, and then apply necessary transformations within the Power Query Editor…Or, do you want to write SQL query by hand to retrieve the data…
If you choose the latter option, you should be aware that the hand-written SQL query can’t fold anymore! So, in this scenario, it’s ALL OR NOTHING! Once you decide to use a custom SQL statement to import the data, all subsequent transformation steps applied in the Power Query Editor WILL NOT FOLD, even if you’re applying some basic transformations, such as filtering, or renaming columns, which would fold in “normal” circumstances.
Let me show you on an example how this looks like:
So, I want to import all rows and columns from my DimDate table. Now, let’s say that I want to keep only rows after the January 1st 2007:
In normal situations, when you import your DimDate table “as-it-is”, this transformation will be easily translated using WHERE clause in SQL (WHERE FullDateAlternateKey > ‘2007-01-01’) and pushed to a SQL database source.
Now, if I run Query Diagnostics tool on this step, I can see that no query folding takes place here!
As you can see, only our initial query was executed on the SQL database side. So, what happened here, is that the Mashup engine pulled all rows from the DimDate table, and then applied the transformation step we requested! No need to emphasize how this affects performance, both in terms of CPU and memory consumption…
There are two possible solutions if you decide to go this way and write your SQL code by hand:
- Create a database view – this is the most desirable scenario! If you are able to create views in the underlying database, you can encapsulate your whole transformation logic within the view, and then import the view “as-it-is”, with all applied transformations already in place! As database views are being treated exactly the same as database tables in the Power BI, you can still achieve query folding on the view!
- Apply all necessary transformations within a single SQL statement – in case you don’t have sufficient permissions to create database objects, the other option is to include all necessary transformation steps within the single SQL statement you’re writing. Something like this:
There are some improvements recently to query folding when writing native SQL queries. Chris Webb wrote an excellent blog post about using Value.NativeQuery() function and providing explicit value TRUE for the EnableFolding argument.
However, at the moment of writing this article, this option works only for the PostgreSQL connector, which is documented here. It CAN work in SQL Server too, but not in the most intuitive way, as you would need to manually replace the M code within Advanced Editor.
So, this is still more of a workaround solution (helpful indeed), but let’s hope that in the near future we get an “out-of-the-box” possibility to explicitly instruct Power BI that we want to enable query folding when writing custom SQL…
I’ve promised in the previous part of the series that I will show you one more example when the View Native Query option can fool you to think that query folding was broken, even if in reality it’s not true…
Let’s say that we want to keep only top X rows from our table. In my case, I want to preserve top 2000 rows from my fact table:
Once I’ve applied this step and checked the View Native Query, I can realize that my query folds, as my transformation was translated to a TOP clause in SQL:
Now, let’s say that I want to apply Absolute value transformation on my Sales Amount column. Normally, this transformation easily folds, as there is an ABS function in T-SQL:
However, if I right-click on this step, I will see that the View Native Query option is greyed-out, so I would assume that this step broke my query folding!
Let’s check this in our Query Diagnostics tool:
Oh, my God! This step folded indeed! So, we were tricked by View Native Query option again!
The key takeover here is: whenever you’re assuming that specific transformation step can be folded (like in this example, when we knew that SQL has ABS function to support our transformation), double-check what really happens under the hood!
The ultimate performance test
Ok, if I didn’t manage to convince you so far why you should strive to achieve query folding, let me now pull my last ace up the sleeve!
I want to show you the difference in data refresh performance between the queries that return exactly the same results – one of them folds, and the other not!
Test #1 Query folding ON
For this testing, I’ll use FactOnlineSales table from the Contoso sample database. This table has around 12.6 million rows, and it’s good to demonstrate the magnitude of importance of the query folding concept.
In the first example, I’ve applied 9 different transformation steps, and all of them are foldable, as you can see in the following illustration:
Don’t pay attention to a SQL code that the Mashup engine generated: if you are a SQL professional, of course, you could write much more optimal SQL code – however, keep in mind that with auto-generated scripts by Mashup engine, you are not getting most optimal SQL – you are just getting correct SQL!
I will hit Close & Apply and turn on my stopwatch to measure how much time my data refresh lasts.
This query took 32 seconds to load 2.8 million records in my Power BI report. Data was loaded in batches of 100.000 – 150.000 records, which is a good indicator that the query folding is in place.
Test #2 Query folding OFF
Now, I will go back to Power Query Editor, and intentionally break query folding at the 3rd step (remember the example above with changing Date/Time type to Date), using the transformation for which I know that is not foldable:
Truth to be said, I will achieve a partial folding here, as first two steps will fold, but all subsequent steps after the Extracted Date transformation will not fold!
Let’s turn on stopwatch again and check what happens:
The first thing to notice: this query took 4 minutes and 41 seconds to load into our Power BI report, which is approximately 10 times more than in our previous case when the query folded. This time, batches of loaded data were between 10.000 and 20.000 records.
But, what’s even more concerning – you can see that the total number of records loaded was almost 11 million!!! Instead of 2.8 million in the previous example! Why does it happening? Well, in the previous articles, I explained that when the Mashup engine can’t translate M language to SQL, it needs to pull ALL the data (from the moment when the query folding was broken), and THEN apply transformations on the whole chunk of imported data!
The final result is exactly the same – we have 2.830.017 records in our Power BI report – but, with query folding in place, all necessary transformations were performed on the SQL database side, and the Mashup engine got already prepared data set. While, in the second scenario, after we broke the query folding, Mashup engine pulled the whole remaining rows (approx. 11 million) and only after that it was able to apply other transformation steps.
And, this was just a basic example, with one single table, and not so big in terms of data volume! Simply imagine the magnitude of implications on a larger data set, with multiple tables in it.
Well, we covered a lot in this blog series. We learned about the Data Shaping concept, we introduced Power Query fundamentals, and we also learned what is a query folding and why should we do our best to achieve it.
I’ve also shared with you some basic examples and neat tricks on how to achieve query folding in some common use cases.
In the end, please be aware that the query folding is a work in progress, and folks from the Power BI team are constantly improving this feature. So, it can happen that some of the issues with query folding I’ve shown you here are resolved in the meantime. Therefore, be sure to stay up to date with the latest improvements.
Thanks to Alex Powers from Microsoft who gathered different resources about query folding, and Christopher Majka, who shared the link, there is a repo on Github, where you can follow all the latest changes to the query folding feature.
Thanks for reading!
Last Updated on March 14, 2021 by Nikola