What is a data discrepancy?
Generally speaking, a data discrepancy is when 2 or more sets of comparable data don’t match up. And, despite sounding kind of technical, it isn’t something that is unique to big data or adtech.
Ever done your shopping while keeping a running total in the calculator on your phone, and ended up handing over a different amount to the cashier?
Then, my friend, you’ve experienced a data discrepancy.
In this case, the discrepancy may have arisen because you didn’t notice a discount or your kid decided to smuggle something into the basket or you accidentally added the same item twice. Now, in this case, other than some potential embarrassment this discrepancy is unlikely to be particularly harmful.
But in the case of adtech and big data, a data discrepancy can be genuinely damaging. It’s really important to take data reliability and consistency really seriously, particularly in an industry like ad tech where data is the foundation upon which the whole thing sits.
Most common causes of a data discrepancy
So, we’ve established that you can end up with a discrepancy between data sets for a variety of reasons, but let’s keep the focus on adtech. Here’s a rundown of the most common causes if you do experience a data discrepancy.
In our post-GDPR world, you need to double and triple check that any third-party tracking you want to add to a creative is approved by the ad serving platform you want to use to traffic those ads. If your third-party tracker isn’t approved, you’ll likely end up with no impressions or interactions being detected by your tracking platform.
If your third-party trackers are added to your NEXD ad tag correctly - ie added directly to the NEXD tag - you’ll still be able to track at least partial data.
2) Tag setup
Double-check your ad tag for any errors. It’s quite common to end up with issues arising from incorrect syntax in the tracker, missing URLs or even characters “breaking” when copy-pasting. There are tools to automatically check your ad tag for syntax errors and it’s always a good idea to double-check all the URLs before your campaign goes live.
3) Date ranges in reports
It’s always important to make sure the reports or data sets you are comparing reflect the exact same date range. On the surface, this seems pretty obvious, but when you start to factor in timezones that can have an impact on dates as well.
Take some time to make sure the date ranges in both analytics platforms match up and that the timezones are also the same (and if they are not, you’ve adjusted your date range if needed).
Virtually all reporting systems in the adtech ecosystem allow you to apply some level of filtering. So, it’s important to ensure that, if filters are enabled for your report, both of the reports have the same filters applied.
If you have a data discrepancy and you have filters applied to both datasets, sometimes it’s worth removing all filters to both sets and just looking at the raw data. The most common issue we see is around viewable and served impressions. Mismatching these two different impression types can lead to wildly different values.
5) Metric definitions and terminology
Many metrics are clearly defined and universally understood - like, CTR. But not every metric is so black and white. Even some pretty fundamental ones differ because there have been attempts at standardization, but there’s no single body that outlines how things should be measured and defined, leading to many companies defining them slightly differently.
At the other end of the spectrum, defining engagement metrics between platforms can sometimes be tricky because it means so many different things to different people. It’s always a good idea to dig a little deeper with engagement metrics because you need to make sure the data you have represents the same type of engagement.
6) Network connection and server reliability
If a third-party server goes down or experiences some kind of network interruption, even very briefly, it’s likely some data will not be tracked correctly, if at all. In this event, check status pages and your email to see if your provider has experienced some kind of issue that may have impacted your campaign.
If there’s a time delay between the systems responsible for tracking when a creative is requested, loaded, viewed and interacted with then you can end up with a discrepancy. Plus, if the user navigates away part way through after clicking but before the tracker on the landing page has loaded, you’ll see a click but no landing page visit.
Another case is when you introduce extra “layers” that can potentially slow things down. For example, NEXD is an ad server - we’ve spent a lot of time making sure it’s super-snappy and rich media optimized. If you then add a NEXD tag to another ad server you’ll end up with a big spike in latency - we’re talking seconds rather than milliseconds.
When and how NEXD investigates a data discrepancy
We take the reliability and veracity of our data very seriously and we have policies and procedures in place to help our customers investigate a data discrepancy if they experience one.
As a general rule, a data discrepancy of 10% or less falls within acceptable tolerances and, in most cases, wouldn’t warrant deeper investigation. A discrepancy in the 11-20% range would be investigated by our support team. Once we’ve received all the information we need to carry out our investigation, we will perform a comparative test for around 48-72 hours to better understand the cause of the issue.
If you see a discrepancy rate higher than 20% then it is likely there is an issue with the ad inventory and you should reach out to your provider and ensure there are no issues with compatibility. We also encourage you to send us a screenshot of your tag setup and a preview link.
Please note that at the start of the campaign when your creative has less than 10k impressions the discrepancy may be a bit higher than 10% and that is acceptable. It’s due to the fact that we may count and report more audit clicks than your DSP. There’s nothing to worry about - the difference will even out once there’s a more considerable amount of data.