DISCLAIMER 1: This topic in the official course curriculum includes Aggregations too… However, both Composite models and Aggregations are extremely important data modeling solutions, and they deserve separate articles in my opinion
DISCLAIMER 2: If you’re talking with Power BI professionals, you may hear them talking about two different Composite models features…Don’t be afraid and don’t be surprised:) Let me quickly draw the line between the two:
- Composite models – this feature lets you combine data coming from two different DirectQuery sources, data coming from DirectQuery and Import mode, or any combination of these. This feature will be the main focus of this article
- Composite models “Gen2” – “Gen2” is not an official name for this feature. You may also hear terms like Composite models v2, New Composite models, etc. However, the official name (at least at the moment of writing) is DirectQuery for Power BI datasets and Azure Analysis Services. This “version” of Composite models lets you connect to a published Power BI Dataset and use it in combination with other DirectQuery and/or Import mode data! As powerful as it looks, this feature is still in public preview, so we will not cover it in this article. If you’re interested to learn more about this type of Composite models, please refer to this article from SQL BI guys
What are the composite models?
In the “dark ages” of Power BI (I know that today it’s hard to believe that Power BI had a dark ages period:)), if you wanted to use DirectQuery as a source for your report, once you’ve chosen DirectQuery for a certain data source (SQL Server, for example), the possibility to extend on this was – gone! To put it simple, you couldn’t combine a DirectQuery query over one data source (i.e. SQL Server) with a DirectQuery query over another data source (think of Oracle, for example). Let alone combine the DirectQuery with Import mode…
Luckily, these times are long behind us, as now you can create a data model in Power BI, that consists of, let’s say:
- Data coming from the Excel file stored on your local machine
- Imported data from the SQL Server table
- DirectQuery query that retrieves the data from another SQL Server table
- DirectQuery query that retrieves the data from Oracle database table
All imported data sources (Excel file and imported SQL Server table data in our case) are being treated as one single artifact because the data is stored in the local instance of the Analysis Services that Power BI uses to store imported data. So, from a Composite model perspective, that’s one “source”.
Now, what happens when we include DirectQuery data source(s)?
Don’t be scared, as I’ll explain what we have in the illustration above…There are two DirectQuery data sources (SQL Server and Oracle), each of them containing data from multiple tables. The power of Composite models is that now you can establish the relationship between the data stored in VertiPaq (imported data) and data residing outside Power BI!
If you ask yourself – why some arrows are dashed, and the others not? That’s a great question! Those arrows represent relationships between the tables in different storage environments. The tabular model supports two types of relationships: Regular and Limited (also known as “strong” and “weak” previously). Explaining the behavior, considerations, and limitations of these two types (limitations when it comes to limited relationships) is out of the scope of this post, but I’d wholeheartedly recommend reading this fantastic article written by Alberto Ferrari to understand all the nuances and potential caveats.
To cut the story short, from the perspective of a (successful) Power BI developer, you should keep in mind two things:
- You CAN create relationships between the data coming from different storage mode sources
- You MUST be aware of the behavior of these relationships
Getting the best from both worlds!
When you’re creating Composite models, you should specify the storage mode for each of the tables in your data model. But, it’s not just a simple choice between Import vs DirectQuery, as there are two additional options to choose from! So, let’s examine all available storage mode options for the table:
- Import mode – this should be your default choice, as it provides the best experience in terms of performance. Also, in some circumstances, that will be your only option, as not all data sources support DirectQuery (think of Excel, for example)
- DirectQuery – as explained in one of the previous articles, when you use DirectQuery, data is being retrieved from the data source at the query time
- Dual – data is stored in cache memory, but can also be retrieved from the original data source. It’s a common scenario to configure dimension tables in Composite models to be in Dual mode, so they can be queried with fact tables from the same source
- Hybrid – one of the latest features in Power BI. You can think of hybrid tables as having both DirectQuery and Import mode within a single table! You configure one table partition to be DirectQuery, while the remaining data stays in Import mode
Based on the setup of your composite model, performance may vary significantly! The most desirable scenario is when your queries target only Import or Dual mode tables, as the data is being served from the cache, thus the performance will be the best possible!
If your queries target Dual or DirectQuery tables from the same source (for example, tables are on the same SQL Server instance), the performance can still be good (assuming that you’ve applied some of the recommended practices for optimizing DirectQuery scenarios), as the relationships between these tables will be treated as regular relationships.
The third scenario is querying Dual or Hybrid tables from the same source. Here, the performance depends if the necessary data comes from the Import partition of the Hybrid table, or not. Again, as tables are in the same source, relationships will be evaluated as regular relationships.
In all remaining scenarios, queries will target tables from different source groups, and in theory, these queries will produce the worst performance. All relationships are treated as limited, with all the limitations that apply to them.
Composite models recommended practices
- Use composite models ONLY when pure Import mode is not an option – as attractive as it looks at first glance, the Composite models feature should not be your default choice when designing enterprise-level models. The possibility to combine the data from multiple source groups comes with a price – and that price is additional model complexity and potential performance issues
- If you’re using fact tables from the same source group in both DirectQuery and Import (or Hybrid) mode, set the storage mode of dimension tables to Dual
- Evaluate scenarios for using Hybrid storage mode to get the best of both worlds
- Identify the appropriate refresh rate for the Dual and Import (and Hybrid) tables, to keep the data as much as possible synchronized with the data coming from the DirectQuery sources
- Apply general recommended practices for optimizing DirectQuery scenarios
Composite models are one of the most powerful features in Power BI! It opens a whole new world of possibilities when it comes to data modeling and helps you satisfy various business needs – what was a dream a few years back, it’s a reality now!
However, as with every powerful feature, this one also requires a careful and experienced data modeler, that will thoroughly evaluate use cases for composite models, and understand the possible downsides of implementing this feature in enterprise-scale scenarios.
Thanks for reading!
Last Updated on July 31, 2022 by Nikola