When I first started doing business in Japan in the mid 1990s I fell in love with sushi. Sitting at the counter eating sushi and drinking sake with my friends felt like the most glamorous thing in the world.
Now, when I'm talking to clients and hear them asking about access to the "raw" data I wonder if they have the same infatuation I had with raw fish. It took me a good ten years to re-acclimatise myself to the pleasures of cooked fish and I think the same thing may be going on here.
When people talk about raw data, it usually comes from a desire to not be constrained by what they want to do with the data. With raw data, it's assumed that anything is possible, and it can be processed exactly as required. That is certainly true, but if you're asking for the "raw" data you also need to be very clear on what you are really asking for.
It's really easy to make sushi, right?
In video, raw data is frequently little more than click-stream information from the remote control (set-top box) data, or a list of frames that match an ingested stream (Smart TV data).
This data is messy and complex, and like a visitor to a restaurant who wants more than a raw fish dumped on his plate, most agencies or networks don’t want to be ingesting that kind of raw data.
Most people, when they request “raw” data mean that they want a granular, sanitised dataset that they can drill down to the level of an individual device. They want any inconsistencies in the data smoothed out, they want the population to have been made representative of the country, and they want data they can trust, and that won’t make them look silly in front of clients.
In other words, they want the opposite of raw data. They want finely prepared fish. There are three key steps to make your data sushi appetising and palatable, as follows:.
Step 1: Smoothing out the inconsistencies
Raw data is full of common inconsistences, notably:
Set-top-boxes that occasionally report viewing from the 1st January 1970 Missing fast-forward or rewind session data Smart TV data where, if the same content is on two networks at once, the ACR algorithm continues to flip-flop between the two OTT data with inconsistences between iOS and Android Any dataset needs to go through a rigorous cleansing process to remove these erroneous instances.
Step 2: Make it representative
Set-top-box data typically is only sourced from an MVPD’s footprint. This needs to be modeled for it to be applied to the rest of the country, and any skews in that footprint need to be removed. With more and more people using antennas to get terrestrial broadcast, STB data is already going to be missing large chunks of the population.
Smart TV is more representative as it covers cable, satellite and broadcast, but it’s only going to cover homes where people have bought a new TV in the last few years and where they’ve connected to the internet through wi-fi. That’s still going to skew the data towards more prosperous and younger people.
Even OTT data-which, by its nature, is representative of every device using that service, needs to be ‘de-skewed’ by eliminating test data and bot views.
Step 3 : No food poisoning please
Finally, you need to be sure that the data is reliable. Nielsen is often rapped for being late in delivering its data. Late is not the worst data can be: data providers frequently have outages and you need resilience in your data strategy to cope with this.
No one wants food poisoning. When Nielsen's Florida data was late in March due to a power outage, people complained, but it was eventually published. If you're taking data to your client claiming that nobody watched their competitor’s commercial, you had better be sure there wasn’t an outage at your data provider during that period. It’s better that you wait for your sushi and getting it in perfect condition than getting it on time but far from fresh.
Leave raw data to the sushi masters
That sashimi on your plate may look delightfully simple and elegant, but a lot has gone into getting it there. It takes five years of hard work to train to be a sushi chef, and creating data sushi is no less complicated.
Unless you have an army of fully-trained sushi chefs on staff, I'd recommend you ask for the processed data from your data providers, and to leave the raw data to the data masters.