A few weeks ago, I was approached by my friend James Reeves (LI), who needed brief assistance in the data preparation process for his Power BI report. The challenge was to split multi-lined cells from the Excel file into multiple rows so that the dataset becomes usable for Power BI reporting purposes.
Something like this in the following illustration:
Of course, I’m by no means a Power Query expert, although I know a thing or two! I remember having to deal with a similar request in the past, but I forgot how I managed to solve it. Therefore, this blog post is a reference for my “future self” (and obviously all the others who may find it useful)…
As a first step, we need to create a list of values, that will be split based on a certain character. In this case, the character is a line feed (#lf), which breaks the text into the new line within the cell:
And, once I click on the cell which contains the list for my first row, I’ll be able to see all the values that this list contains:
The next step is to create a set of values, based on the position in the list. So, the 1st element from List 1 will be together with the 1st element from List 2, the 1st element from List 3, and so on…The 2nd element from List 1 with the 2nd element from List 2…You get the point.
For that task, we’ll use the List.Zip Power Query function. In a nutshell, this function combines items from the same position in different lists, and creates a new list of values. Something like this:
List.Zip({{1, 2}, {3, 4}, {5, 6}}) //Output { {1, 3, 5}, {2, 4, 6} }
Let’s expand our previous Add Custom Column step, to include all the columns that contain cells with multi-lined values. Finally, we will wrap everything with List.Zip:
= Table.AddColumn(Source, "Demo", each List.Zip( {Text.Split([Subjects],"#(lf)"), Text.Split([Teachers], "#(lf)"), Text.Split([Grade],"#(lf)"), Text.Split([Extra curicular], "#(lf)")} ))
Now, we have a list containing multiple lists:
Let’s quickly check what’s inside these lists. I’ll right-click on the first list, add a list as a new query and let’s take a look:
That’s exactly matching what we want to see in the first row of our original table:
Since I’ve already pulled all the values from the multi-lined cells into lists, I don’t need the original columns anymore. So, I’ll simply add the columns that I need to keep, in addition to the one we’ve created:
= Table.AddColumn(Source, "Demo", each List.Zip( {Text.Split([Subjects],"#(lf)"), Text.Split([Teachers], "#(lf)"), Text.Split([Grade],"#(lf)"), Text.Split([Extra curicular], "#(lf)")} ))[[Firstname],[Surname],[Address],[Demo]]
I can now click on the arrows icon and select Expand to New Rows:
Finally, let’s click again on the arrows and then Extract Values:
Then, I’ll need to specify a delimiter that’s going to be used when extracting these list values together. In my case, I’ve chosen a semicolon and I get the following results:
One last step is to split our new column. We want to split it by delimiter that we’ve previously used (semicolon):
BAM! There you go! As you may see, we now have a separate row for each set of values that were previously part of multi-lined cells.
A final step is to rename the splitted columns accordingly, so they match the original naming:
Mission accomplished – here is our nice table, ready to be consumed via Power BI reports!
Special thanks to Melissa De Korte for the inspiration to use List.Zip…
Thanks for reading!
Last Updated on March 24, 2023 by Nikola