Mittwoch, 16. Dezember 2015

Dealing with tables with changing headers in Power Query

When you are dealing with Excel for a while you will get a lot of different data from many different sources with a lot of different structures from many different people. And it will be your job to bring this data into a format/ structure you can deal with. When this is you daily job Power Query will already be your best friend.

Sometimes it's not only the data that changes, but also the headers of the table you are working with. Take the following example.

The table has one column for the Department, and another one for the number of employee. The header for the second column isn't 'Employee", but the date of the current day.

Let's pull this table into Power Query and let's change the header (as a very simple example). What we see is this:

Now we want to change the header into something more meaningful: 'Employee'.
We right click the column header and click on rename. The result is not surprising and looks like this in Power Query…

… and like this in a Excel table:

Let's say this report comes once a week. So you get the report again on the 20th of December and your table has a different header for column 2. You press the refresh button for your specific Query 'Employee" …

… and you get the following error message:

Let's go into Power Query again, to find out what happened here.

The first step within the complete Power Query query 'Source' does still work. It pulls the Excel table inside Power Query properly.

The error occurs in the second step. Why?

The M script, that renames the column into 'Employee" is the following:

Table.RenameColumns(Source, {{"13.12.2015", "Employee"}})

This function uses the table 'Source' (our step 1 within the Power Query query), searches for column '13.12.2015' and renames it 'Employee'. And this is the problem: '13.12.2015' is hard coded and does not change, when the header of column 2 is different from '13.12.2015'.

How can we handle this?

When Power Query needs to know the correct name of the second column, then we need to evaluate the column name, before we rename it.

Under the term, that the structure of the table does not change, we are looking for the name of column 2. Let's delete step 'Rename column' and press the insert function button:

You will see the function '= Source', what references to your first step in your Power Query query. Surround 'Source' by the function Table.ColumnNames(), so that you get 
= Table.ColumnNames(Source).

What you got is a list of all headers of your table. To get the one of the second column just adjust your formula like this:

= Table.ColumnNames(Source){1}

Because lists in Power Query are 0 based, you have to type in {1} to get the name of the second column.

Know that you know how to get the name of the required column, delete the second step, so that only 'Source' exists. Because you drilled into the value '20.12.2015' there is no cross at the second step to delete it. Therefore just remove {1} in the formlua bar.

The second step (which meanwhile was automatically renamed 'Navigation') gets its old name again (in my case 'ListOfHeaders') and now can be deleted by clicking on the tiny cross in the left corner of the steps name.

Now  right click the column 2 again and rename it 'Employee'. This time go into the formula bar and replace the hard coded '20.12.2015' by the formula we developed together: Table.ColumnNames(Source){1}

The result is the following:

For this table nothing changed. But now save the query and go back to the source table in Excel. Change the header from 20.12.2015 to 27.12.2015, or any other name you prefer. When you press the refresh button of the Power Query query you won't see errors anymore, because the specific column name is evaluated at any time before it gets renamed.   

You can download an example file from here

See you,