“Will that break a query folding?” “Does your query fold?”… Maybe someone asked you those questions, but you were like: “Query…Whaaaat?!”
Or, maybe you’ve heard about query folding in Power BI, but didn’t know how to take advantage of it in real-life scenarios.
If you recognized yourself in (at least) one of the two situations specified above, then please continue reading this short series of blog posts!
Fine, you are curious to find out what is a Query folding. But, first things first…Before you get into it, we’ll need to put some theoretical foundations which will place the Query folding feature in the proper context.
I’ve already written about data shaping and why is it one of the key concepts in the data preparation phase. Now, I would like to expand on that in (maybe) unusual way:
I guess you all know about the book written by Thomas More, called “Utopia”.
In that story, everything is perfect and everyone is satisfied. In an ideal world, let’s call it “Data Utopia”, we have clean, high-quality data, that just flies into our reports “as-it-is”, without needing to perform any kind of face-lifting or transformations along the way. Unfortunately, “Data Utopia” can exist only in books – the reality is crueler – as we have to deal with a whole bunch of challenges while nurturing our data.
That being said, one of the key concepts that we have to absorb is Data Shaping. Data shaping is the process you should perform once you get familiar with your data, and become aware of possible pitfalls within the data you are planning to use in your business intelligence solution.
I’ve intentionally used the term “Business Intelligence” instead of “Power BI”, as this is a general concept that should be used outside of Power BI solutions too.
In most simple words, data shaping is the process of data consolidation, BEFORE it becomes part of your data model. The key thing to keep in mind is the word: BEFORE! So, one would perform data shaping before the data goes into the report itself. Data shaping can be done at different places, and, depending on where you apply data shaping techniques, at different points in time during the data preparation process.
WHERE should you perform data shaping?
Source Database – This is the most obvious choice and in most cases the most desirable scenario. It is based on traditional data warehousing principles of Extracting-Transforming-Loading (ETL) data. In this scenario, you define what data you want to extract (not all data from the database is needed, and it’s usually not a good idea to import all the data). Then, you decide if your data needs to be transformed along the way, to better suit your reporting needs – for example, do you want to perform currency conversion, or do you need to conform country and city names.
Do you recognize the city on the following image?
Yes, it’s New York. Or, is it NYC? Or, is it New York City? Which one of these three names is correct? Yes, all of them are correct – but if you import the data in your data model like this, you will get incorrect results – as each New York, NYC, and New York City will be treated as a separate entity. This, and many more potential caveats, need to be solved during the Data Shaping phase, and that’s why important to spend some time massaging your data.
If you don’t perform data transformations on the source side, the next station is Power Query – it’s the built-in tool within Power BI, that enables you to perform all kinds of transformations to your data. According to Microsoft’s official documentation, you can apply more than 300 different transformations, and that number is constantly increasing!
The key advantage of Power Query is that you can perform complex data transformations with little or no coding skills! Additionally, all steps you’ve applied during the data transformation process are being saved, so every time you refresh your dataset, those steps will be automatically applied to shape your data and prepare it for consuming via reports.
Under the hood of Power Query is a Mashup engine, that enables your data shaping to run smoothly. Power Query uses very powerful M language for data manipulation. And, now you are probably asking yourselves, what all this story about data shaping, Power Query, Mashup engine, M language, etc. has to do with Query folding. I don’t blame you, it’s a fair question, but we will come back soon to answer it.
What is a Query folding?
For some data sources, such as relational databases, but also non-relational data sources, for example, OData, AD, or Exchange, Mashup engine is able to “translate” M language to a language that the underlying data source will “understand” – in most cases it’s SQL.
By pushing complex calculations and transformations directly to a source, Power Query leverage capabilities of the robust relational database engines, that are built to cope with large volumes of data in the most efficient way.
That ability of Power Query’s Mashup engine to create a single SQL statement combining all M statements behind your transformations is what we call Query folding.
Or, let`s make it simple: if the Mashup engine is able to generate a single SQL query that is going to be executed on the data source side, we say that the query folds.
Data sources that support Query folding
As already mentioned, the most obvious beneficiary of query folding is relational database sources, such as SQL Server, Oracle, or MySQL. However, it`s not just that SQL databases take advantage of the query folding concept. Essentially, any data source that supports some kind of querying language, can possibly take advantage of query folding. Those other data sources are OData, SSAS, Sharepoint lists, Exchange and AD.
On the other hand, when you use data sources such as Excel files, BLOB storage files, flat files, etc. in your Power BI datasets, the query can’t fold.
Data Transformations that support Query folding
However, when it comes to data sources that support query folding in general, it’s important to keep in mind that not all transformations can be folded and pushed to a data source. So, just to be clear, a fact that SQL database supports query folding, doesn’t necessarily mean that your query will fold! There are some Power Query transformations that simply can`t be pushed to a SQL database engine.
Very often, some subtle differences in the Power Query transformations can be decisive in the final outcome, and if your query will fold or not. I’ll show you a few of those subtle differences in the next parts of this series.
Generally speaking, the following transformations, when applied in Power Query, can be “translated” to a single SQL statement:
- Removing columns
- Renaming columns
- Filtering rows, with static values or Power Query parameters, as they are treated as WHERE clause predicates in SQL
- Grouping and summarizing, which have equivalent in SQL’S Group by clause
- Merging of foldable queries based on the same source – as this operation can be translated to JOIN in SQL. When I said, merging of foldable queries – that means it will work if you are joining two SQL server tables, but it will not work if you are trying to join SQL table and Excel file
- Appending foldable queries based on the same source – this transformation relates to the UNION ALL operator in SQL
- Adding custom columns with simple logic. What does simple logic mean? Using M functions that have equivalents in SQL language, for example, mathematical functions, or text manipulation functions
- Pivot and Unpivot transformations
On the other hand, some transformations that will prevent the query from folding are:
- Merging queries based on different sources, as explained previously
- Appending (union-ing) queries based on different sources – similar logic as in the previous case
- Adding custom columns with complex logic or using some M functions that don’t have a counterpart in SQL
- Adding index columns
- Changing a column data type. This one is a typical “it-depends” case. I will show you in the next part of the series what it depends on, but just keep in mind that changing a column data type can be both foldable and non-foldable transformation
By learning about Data Shaping and Power Query, we laid a solid theoretical background to understand the Query folding feature in Power BI, or maybe it’s better to say in Power Query Editor.
In the next part of the series, we will examine why Query folding is so important, and why should we care about the fact if the query folds or not. We will also learn how to check if the query folds or not, and why people say that the devil is in the detail…
Last Updated on July 28, 2021 by Nikola