Freitag, 12. Februar 2016

Case sensitivity: Power Query vs. Power Pivot

Power Query and Power Pivot are great companions. What ever you want to import and shape… Power Query can make it and Power Pivot builds the models you need to create amazing data insights.


Following that way all the tables of my models come into Power Pivot using Power Query. Take a look at the following simple structured table:




This table shows revenue per customer and date. What I usually do, when I get this kind of table, I import it twice with Power Query.




The first import is the table itself - almost unchanged. Only some data type conversions. Nothing special.

The second time I import that table I create a Dimension from the customer column:


I remove all columns except the customer column. Then I remove the duplicates and safe the query as DimCustomer (you can repeat these steps for each dimension you have within that table).




Now, that you have  a fact table and a dimension table you can load these tables into Power Pivot to create your (very simple) data model. 


WTF ?!

What happened here? I created a dimension table (with no duplicates) and Power Pivot can't create a relationship between fact table and dimension table, because of duplicate values in both columns?!

Let's have a closer look at our Power Query created dimension table….


After all duplicates were removed, we see that Michael (no matter how it is written) appears two times in that dimension table. So for Power Query each version of the name Michael is a unique value.

Let's see how Power Pivot handles this:


Power Pivot takes the first appearance of Michael and copies it for all the other times it appears later on. Upper or lower case doesn't make any difference. Due to that fact it is clear why Power Pivot didn't accept the relationship between fact table and dimension table: The dimension table has duplicate values in a key column.

But how can we overcome that?

That's pretty easy. Go to the query of your dimension table and put the customer names in lower case (or upper case as you like)


If you remove the duplicates now, Power Query treats all names the same way, because there is no different writing. So you overcome the case sensitivity of Power Query by removing upper respectively lower case.

The dimension table now has only one Michael (as we need it for Power Pivot).


One further step of transformation and the names look beautiful again.


Now you can go back to Power Pivot and create your relationship. You won't get any error message anymore :)



Happy data modelling :)

Regards,
Lars