The only constant when it comes to technology (and this applies especially to Power BI) is – change! And, when I say a change in Power BI, I don’t mean exclusively on regular updates with tons of new features, coming every single month. Existing features are also being constantly improved and upgraded, so it can easily happen that something that was not supported a few months ago, or you had to perform different workarounds to find a solution, now works by default.
One of the cool new things that now work is the possibility to achieve query folding for the hand-written SQL queries! As I previously wrote in this 3-part series article, the rule of thumb is: once you decide to write a custom SQL to import the data into Power BI, you are saying “Good-bye” to query folding for all subsequent steps.
What is a Query folding at first place?
Before we jump straight into the action, let me first explain what is a query folding.
In most simple words: if the Power Query engine is able to gather all your transformations and generate a single SQL statement that’s going to be executed on the source side (SQL database in most cases), we say that the query folds!
This way, by pushing transformations and calculations to a data source side, in most cases you’ll get a performance benefit for your data refresh process. I’m not going to reiterate through the different scenarios when the query folding will work or not work – I warmly suggest that you read the articles I’ve already mentioned – and also refer to an official Microsoft documentation.
In this article, I’ll focus on a specific scenario when you decide to use a native SQL query as a source for your Power BI dataset.
What is a native SQL query in Power BI?
One of the first decisions you need to make when you import the data into Power BI is: do you want to grab the data from the SQL database “as-it-is”, and then apply necessary transformations using Power Query editor…Or, do you want to write your own SQL code to get the data.
If you choose to write custom SQL, all subsequent transformation steps applied in the Power Query Editor will not fold, even if you’re applying some basic transformations, such as filtering, or renaming columns, which would fold in “normal” circumstances.
This statement should be correct, right? Well, not anymore:)
In the illustration above, I’m importing all the rows and columns from the FactOnlineSales table in the Contoso sample database. This table contains ~12.6 million rows.
“Traditional” way of breaking query folding
One of the columns is Sales Amount. Now, let’s say that I want to keep only those rows where the Sales Amount value is greater than 400. I will open Power Query editor and add this transformation step:
After I’ve applied this step, if I right-click on it, I can see that the View Native Query option is greyed out, which means that my query probably doesn’t fold. In this case, this is true, so let’s check how long this query now takes to load in the Power BI.
It took around 160 seconds to refresh the data, but the following illustration shows what happened in the background:
Basically, the Power Query engine had to pull all the data from the FactOnlineSales table from the underlying SQL Server database, and THEN apply the filtering condition we specified! In the end, there were ~2.1 million records that satisfy our criteria for the sales amount value. So, whatever transformation step we apply after the first statement (writing custom SQL), the query will not fold!
Magic M function to the rescue
Let’s now try another approach. Inspiration came from this great blog post from Chris Webb, and I’ve just slightly adjusted it in the following example:
As you may notice, I will not write my SQL statement in the starting window. I’ll leave that blank and import all database objects from the Contoso database (all tables, views, and table-valued functions):
Besides using one of the 300+ built-in Power Query transformations, you can also write your own M code to apply on the data. Therefore, if I right-click on the last step, I can choose to Insert Step After:
This Custom step lets me manually enter the M formula within the formula bar, so I will go and enter the following M code:
= Value.NativeQuery(Contoso,"SELECT * FROM FactOnlineSales",null,[EnableFolding=true])
The “magic” happens within the Value.NativeQuery() function. You can read more about this function and its arguments here. We could’ve left the third and fourth argument out, and the query would still work, but by providing the fourth argument EnableFolding, and flagging it as TRUE, we explicitly instructed the Power Query engine to enable query folding!
Let’s now apply exactly the same filter as in the previous case – keeping only those rows where the sales amount value is greater than 400. If you right-click on the last step, the View Native Query option should be enabled. And, once I clicked on it, I see that my query now folds, as the filtering condition was nicely translated into SQL where clause!
Finally, let’s check if this workaround had an impact on the data refreshing process:
This time, by pushing the filter upstream to a SQL database, Power Query retrieved only those rows that we needed. That obviously affected the data refresh processing time, as it took around 20 seconds!
Joining still doesn’t work!
I’ve tried to expand on this, and combine multiple tables in the native query:
= Value.NativeQuery(Contoso,"SELECT * FROM FactOnlineSales AS fco INNER JOIN DimCustomer AS c ON c.customerKey = fco.customerKey",null,[EnableFolding=true])
However, I’ve received a following error:
Writing custom SQL code to get the data into Power BI, by default will still break a query folding. However, as you witnessed, there is a handy workaround to “force” the Power Query engine to take advantage of query folding even in the situations when you decide to use the native SQL query option.
In full honesty, this trick has a limited number of use cases, as you can’t combine multiple database objects within one single SQL statement like you can do in the “traditional” way of writing SQL code in the starting data import window. Still, the preferred way for using custom SQL for data transformation would be to create a view within the source database. The view will contain the whole transformation logic, and you can then import it “as-it-is” in Power BI. Database views are foldable objects, which means that you can still benefit from query folding if you apply foldable transformation steps on that view!
However, it’s nice to see that things are moving forward, and let’s hope that in the near future query folding for native SQL queries will be enabled by default.
Thanks for reading!
Last Updated on May 25, 2021 by Nikola