Why data integrity matters in Digital Analytics through a GA example

You might have missed it: from June 29th on Google Analytics, organic traffic coming from the search engine 360 (so.com) started to be split between the Organic Search and Referral channels, thus causing some fluctuations in your reports.



If on the one hand, this change could appear trivial given the limited market shares of 360 Search (below 10%), but on the other hand, it also highlights the necessity for marketers to maintain their tracking up to date to take relevant decisions - if such change had happened on Baidu, no one imagines that consequences could have been potentially much more important…

However, while today most organizations have deployed analytics solutions to measure their digital activities, only few operate regular data integrity maintenance often by lack of awareness.

In this article, we therefore propose to introduce the different facets of the data integrity challenge through concrete solutions for Google Analytics at the occasion of the 360 search engine issue.

What elements affect data integrity?

Based on our experience at altima° China, we dressed up a list of the 5 most recurring causes of data breaches:

  • Release of marketing campaigns without relevant tagging – any PPC campaigns on Chinese search engines will, for instance, appear under the Organic Search channel in the absence of UTM parameters
  • Analytics hacks – sadly, several companies promote their own services by generating fake traffic on Google Analytics, especially within the Organic Search and Referral channels
  • Bug in the analytics tracking – because we're all humans after all
  • Website’s policy/technical evolutions (mostly search engines) – we'll go more into details on this point later in the article
  • Google Analytics adjustments – not frequent but not well announced too

Which KPIs to monitor data integrity?

At first sight, tracking abnormal data may turn very time-consuming given the high number of metrics & dimensions available among the Google Analytics reports.

To keep the monitoring accessible, we defined a batch of generic KPIs to detect easily and efficiently any sudden loss of data accuracy:

# Metrics

  • Number of sessions
  • Bounce-Rate
  • -Number of macro conversions (purchases, leads, contact inquiries…)

# Dimensions

  • Common marketing channels (Organic Search, Paid Search, Referral…)
  • ‘(Other)’ marketing channel (i.e. the channel listing all traffic that couldn’t be recognized and classified among the existing marketing channels)

In the case of the 360 Search problem, we isolated the issue by analyzing the sessions of the Referral channel and noticed the new occurrence of the search engine so.com.

How often monitor data integrity?

While this appears as a very legitimate question, the answer still highly depends on the richness of the marketing campaigns (in numbers and types).

As a basis reference, brands investing simultaneously and continuously in several promotional activities (e.g. PPC, display, RTB, affiliation, emailing, social) may require a weekly monitoring. At the opposite, brands investing both occasionally and in few marketing campaigns may only require monthly follow-up.

How to fix data integrity issues?

Google Analytics, one of the most flexible analytics tools available on the market, offers simple solutions to adjust tracking from its interface. We summarized the different features below.

# Filters

The Filters features allows to modify the tracking before data are processed on Google Analytics. When used for data integrity purpose, filters usually serve at overriding mediums (e.g. transforming the referral traffic of so.com into organic traffic) or simply excluding non-relevant traffic.

# Channel Settings

Less radical than filters, the Channel Settings feature allows to customize the default Channel Grouping settings of Google Analytics by modifying existing channels (e.g. adding traffic from a bespoke medium to an existing channel) or creating new ones (following your needs).

# Organic Custom Sources

Largely forgotten by the marketers, the Organic Custom Sources feature facilitates the addition of extra search engines among the Organic Search report and thereby retrieving precious keyword information (even if more and more search engines tended to block it recently).

Note that none of these features are retroactive – past data will therefore not be reprocessed following the new settings.

Now, what about the 360 search engine issue…?

As showed in the initial graph, a part of the 360 search engine traffic still continues being driven in the Organic Search channel, implying that user’s queries may be addressed from another location, most likely happening to be the other products of the brand such as the 360 antivirus software or the 360 web browser. However, as this audience is still correctly tracked, no further action is necessary.

Besides, deeper analysis of the new referral medium highlighted that the most of the traffic was generated via Desktop devices, so most probably through the desktop version of the 360 search engine (www.so.com).

Our first action naturally aimed at declaring the domain so.com as a custom organic search source. However, additional tests continued to return our organic searches as referral traffic, supposing that the origin of the issue may be either a technical change among the search system or an update of the search engine privacy policy aiming at locking user’s searches (just like Google does).

Our second action therefore consisted into forcing Google Analytics to override the medium referral into organic through the creation of a bespoke filter:

As a result, all traffic from the domain so.com appears tracked again in the Organic Search channel.

Written in 09 Nov 2016

Sylvain Sipp

SEO / Digital Analytics Consultant