Tracking for Truth – Level Up Your Test Data
You’re Only Testing What You’re Tracking
Now that we’ve built a structure to guide our testing program towards maximum impact, you may want to take a deeper look at how you can design individual tests to clearly show the full impact of your hard work and brilliance. I hate to cramp the artistry of marketing with stuffy scientific formality (that’s a lie – we need more science in our marketing lives), but without a moment of structured planning, it is surprisingly easy to miss tracking all the critical performance metrics and ending up forced to evaluate impact from proxies and correlations.
Accurately Measuring Your Impact
First, let’s define the mission. We build tests to generate the data that we’ll interpret to objectively describe the optimal value for each variable, according to a predetermined definition of success.
The performance of each test element is measured by comparing the quantity of business value it returns against the scale of its opportunity to have returned value during the test. How you define value in each case is a broad question, which will benefit from careful thought. Whenever you have defined this target value, ask why that is desirable. If the answer is that it serves some greater objective, then work towards that top end value if you can. You will want to push as far towards ultimate business value as the capabilities of your tracking allows.
In the case of creative testing, success can be evaluated with from either of the following:
- the total number of opportunities to generate value
- the opportunities beyond the point of investment against the creative (e.g. a search ad is seen but not clicked, thereby generating no value but also incurring no cost)
For example, a focus on increasing the total number of search conversions means that performance should be measured relative to the ad impression, while an evaluation starting at the click will be the approach for improving cost efficiency. Usually, your priority will be a balance between scale and efficiency, and both perspectives should be measured and evaluated.
Optimizing towards maximum scale is straightforward – track performance relative to the first opportunity that users have to engage. For efficiency-based test planning, identify the pay event which represents your greatest investment in the delivery of that creative. For display and social this will most commonly be the impression; for search, charges are overwhelmingly triggered on the click. This can become more complex in some cases, such as impression tracking costs associated with a CPC auction or CPA-billed affiliate partner. Here, the relative weight of investment against each stage of user interaction should be considered in your final analysis.
An Example – Optimizing Creative Performance
Walking through a scenario will help to clarify this theory for your practical application. Probably the most common type of test is an A/B/n to optimize creative rotation. This is a particularly instructive example because there are some pretty firm best practices, but also real variability in how tests will be configured.
1. Best Practices
This type of test is used to determine which creative design is the most effective, so we will want to remove as many distracting variables from the data as we can. When trying to improve performance it’s natural to look at the CPA or other cost-based metrics (after all, that’s what we ultimately care about), but by avoiding any metrics built on cost we can remove the effect of all delivery decisions and let the data show the pure impact of the creative design. Here we want to measure only the likelihood that a user will perform the goal action, from the point that they see the creative and have an opportunity to take action on it. For this, I recommend the Yield metric:
Yield = Goal Actions / Opportunities * 1,000
We are multiplying by a thousand to follow the impression-grouping convention of CPM. Yield values are best formatted as a percent, based on their effective combination of CTR and Conversion Rate into a single metric that covers the full user path. We want to look straight to the goal action so that our data will reflect the effectiveness of the creative in informing users towards that action, rather than simply provoking a soft-intent click. CTR remains a useful metric to consider alongside Yield, as CTR covers the scope of pure influence of the creative design, before site design and deeper product features exert an influence. Note that working with Yield will center your analysis on how well the creative engages users and how well it pairs with the landing page and funnel experience. Where possible, a refinement to cut additional muddying data out of your analysis is to define the user’s ‘Opportunities to perform’ based on only Viewable Impressions.
Above, you will have noticed that I said the test is being performed to see which creative is the most ‘effective’. The definition of effective is where we see the variation in test designs come in. Again, you will want to align the Goal Action with ultimate business value as closely as tracking will allow, but in cases where there is insufficient data to make a statistically reliable decision on the goal action, you will have to set the evaluation point to the farthest action down the value funnel that offers a sufficient volume of data for analysis. How you can determine statistical relevance, and make sure that you can deploy your tests as soon as that point is reached, is a larger topic (which I aim to cover in a future post).
In your analysis, it is also a good idea to explicitly forecast your positive tests’ benefits to expected future activity. Here, it is important to consider both the increase in value generated under a consistent investment and how much additional scale you may have opened up to profitability by the improved value-generating efficiency that you’ve achieved.