Data data everywhere, now what?

on 26 June 0 Comment

So now that we have data to work with, now what?  I wish I could remember the person who told me this, a representative for some application (I’ll have to look at my emails way back) – he told me about the 5 Cs of Data.

5Cs of Data

  • Current
  • Complete
  • Context
  • Consistent
  • Correct

In order to make this report or dashboard of use, it needs to be Current.  Understand what refresh schedule is needed or enough for this type of report.  Is a daily refresh enough?  In this case, a live connection would be ideal since it’s dealing with real time transactions and availability if we were to use it to buy.  If we were to just analyze trends, maybe an hourly refresh.  We’d also have to factor in the speed of the refresh.  Given my personal resources, and it’s working with web data, it’s a bit slow – approximate 30 minutes for 3,000 rows.  With organizational datasets, it could be 30 million rows within 30 minutes.

    • What is end use of report?
    • What is the refresh speed?

Complete – Usually we are missing pieces that would give us a better report – in this case, what about misspelled entries (add another query for common misspellings?)?  In organizations, data that is not structured (see if this can be addressed, if not data clean up is needed) can give multiple versions of a name and not aggregrate.  Think about the different scenarios of how the data you are working with can come across and account for as much of it as possible.

Context is what really gives data its meaning.  I attended a presentation by Cole Nussbaumer Knaflic (Storytelling with Data) and what I remember most is when looking at a report or dashboard, be blunt and think to yourself “SO WHAT?”.  If I see a record for sale for $20, how do I know how to act?  Is it a good deal or not? Having other data points such as market value would be a good start.  What does it cost to ship? How many have sold recently at what prices?  What is the condition of the item?  Does the seller have a good reputation?  And then take it a step further and make it easy for the viewer to know the overall score (good buy, poor buy).

Being Consistent is also key.  Making sure that comparisons are indeed apples to apples.  Apply the same parameters to historical data or account for changes in the new data.  Paying attention to what units are being reported, for example currencies and their exchange rates.

And finally, that the data is Correct.  Seems like a given but things happen during the journey of the data from the source to your end report.  An incorrect formula, a relationship that changed – make sure to check and put systems in place where you can easily identify areas that could go awry.  See how much of this can be avoided by going to the root cause of the error which leads to Roche’s Maxim

Data should be transformed as far upstream as possible, and as far downstream as necessary.

Share:

You Might Also Like