Scoring third party data quality

Scoring third party data quality

Tom Weiss, Wed 25 October 2017

It's no secret that not all data is the same. When you're buying and selling segments, you're frequently left to wait until the end of the advertising campaign to measure the performance. If you've been disappointed it's then too late to know whether the segments were right, correct, or whether you just had the wrong creative or call to action in the campaign.

We've started systematically looking at segment quality using some different mechanisms: looking for statistical correlations between datasets; comparing them against a random sample of the data, and even running primary research against new segments to establish what degree of truth the datasets might have.

By directly validating the truth behind a segment we start to establish a degree of currency in the segment business. The example below shows the result of a test we've run on an online segment that is readily purchasable online, and although buying against it is clearly more efficient than buying generic traffic, there is a long way to go in improving the quality.

TV Data Collection

Validating segments by correlation

Because many segments overlap, we don't need to run primary research on all of them to be able to model a scorecard for a broad set.

Take for example, a segment of people owning homes worth over $1m. We would expect this segment to have a high overlap with people earning more than $200k a year or those with a high credit rating. If we saw correlations with a low credit rating score or no association with other high-net-worth segments we would start to lower the score we give it.

This approach allows us to score most segments effectively, but we still struggle in areas like Pet Owners or Auto Intenders.

In these cases, we have to rely on primary research. Countless surveys have told us that women are more likely to own cats than men, but merely seeing a correlation between the Cat Owner segment and the Female segment is not clear enough as over 50% of the population is female, and far fewer than 50% own cats.

If we can correlate between purchasers of cat food and cat owners, this could be a root. However, we have to be careful about how the segments are themselves derived. If the Cat Owner segment derives from purchasing patterns we are testing the hypothesis against the predicate.

To get the best view of segment quality, we need more than just correlations. A combination of primary research, correlations, and random sampling signals the future, and the results we see to date are promising.

Need help? Get in touch...

Sign up below and one of our data consultants will get right back to you

Other articles about Data Operations

Dativa is a global consulting firm providing data consulting and engineering services to companies that want to build and implement strategies to put data to work. We work with primary data generators, businesses harvesting their own internal data, data-centric service providers, data brokers, agencies, media buyers and media sellers.

145 Marina Boulevard
San Rafael

Registered in Delaware

Thames Tower
Station Road

Registered in England & Wales, number 10202531