Summing Across Overlapping Dimensions “1:1″
** If you’re a database guru you’ll find this pretty boring – if you’re not it may actually be insightful **
How many people came into the store today? Let’s see, the store has 3 entrances. If we count-up the # of people who came in Door A, Door B and Door C…this should give us a reasonable estimate of total unique visitors to the store, right? Kind of like a website where you can only enter once per visit. Sure, I can come back later through another door or on the web my session came timeout, but it should be reasonably close.
What it we tried counting by the # times people pass through various departments in a department store? Let’s take 5 people and the departments they visit:
- John – tools, toys, food, pharma
- Jeff – sports, clothing, food
- Joe – pharmacy, food, tools
- Jesse – register for gift card
- James - tools, food, sports
We know there are 5 people, but let’s try counting by way of department.
We see that if we sum people across departments we get a count of 14 people, where only 5 unique people actually exist. The highly sophisticated Venn diagrams below the table where the circles represent some dimension highlights both good and bad dimensions for summing unique items. The department example falls under bad (both for summing and artistic sensibility).
Who cares? Well, in a world where data is increasingly defined by having multiple dimensions associated with each metric there’s quite some room for this type of error or similar errors in aggregating data. If a # ever seems really high or skewed and you’re operating across many dimensions…ask yourself, can the item generating the metric live in many dimensions?
Other situations where summing unique folks across dimensions could get complex:
Geography – person could have many homes. Zip codes crossing multiple counties?
Retail – multiple departments. Multiple purchases (hopefully) or multiple unique customers in same household.
Finance – multiple cards or accounts.
Gaming – played multiple games. Unique person played the same game during multiple sessions on the same date.
Travel – multiple bookings same vacation.
The most infamous example of all at this point for marketers is SAME PERSON MULTIPLE DEVICES (ha ha ha laughs the distracted consumer, catch me if you can!).