Zip, Zip, Hurrah! Using Power Query to split multi-lined cells!

A few weeks ago, I was approached by my friend James Reeves (LI), who needed brief assistance in the data preparation process for his Power BI report. The challenge was to split multi-lined cells from the Excel file into multiple rows so that the dataset becomes usable for Power BI reporting purposes.

Something like this in the following illustration:

Of course, I’m by no means a Power Query expert, although I know a thing or two! I remember having to deal with a similar request in the past, but I forgot how I managed to solve it. Therefore, this blog post is a reference for my “future self” (and obviously all the others who may find it useful)…

As a first step, we need to create a list of values, that will be split based on a certain character. In this case, the character is a line feed (#lf), which breaks the text into the new line within the cell:

And, once I click on the cell which contains the list for my first row, I’ll be able to see all the values that this list contains:

The next step is to create a set of values, based on the position in the list. So, the 1st element from List 1 will be together with the 1st element from List 2, the 1st element from List 3, and so on…The 2nd element from List 1 with the 2nd element from List 2…You get the point.

For that task, we’ll use the List.Zip Power Query function. In a nutshell, this function combines items from the same position in different lists, and creates a new list of values. Something like this:

List.Zip({{1, 2}, {3, 4}, {5, 6}})

//Output
{
    {1, 3, 5},
    {2, 4, 6}
}

Let’s expand our previous Add Custom Column step, to include all the columns that contain cells with multi-lined values. Finally, we will wrap everything with List.Zip:

= Table.AddColumn(Source, "Demo", each List.Zip( {Text.Split([Subjects],"#(lf)"), Text.Split([Teachers], "#(lf)"), Text.Split([Grade],"#(lf)"), Text.Split([Extra curicular], "#(lf)")} ))

Now, we have a list containing multiple lists:

Let’s quickly check what’s inside these lists. I’ll right-click on the first list, add a list as a new query and let’s take a look:

That’s exactly matching what we want to see in the first row of our original table:

Since I’ve already pulled all the values from the multi-lined cells into lists, I don’t need the original columns anymore. So, I’ll simply add the columns that I need to keep, in addition to the one we’ve created:

= Table.AddColumn(Source, "Demo", each List.Zip( {Text.Split([Subjects],"#(lf)"), Text.Split([Teachers], "#(lf)"), Text.Split([Grade],"#(lf)"), Text.Split([Extra curicular], "#(lf)")} ))[[Firstname],[Surname],[Address],[Demo]]

I can now click on the arrows icon and select Expand to New Rows:

Finally, let’s click again on the arrows and then Extract Values:

Then, I’ll need to specify a delimiter that’s going to be used when extracting these list values together. In my case, I’ve chosen a semicolon and I get the following results: