A Kitten-Filled Introduction to Continuous Optimization

by Angus Lynch on February 17, 2015
0 Flares 0 Flares ×

h9lzn

A/B testing is awesome! *

(*When you have an experienced and dedicated optimization team working off meaningful hypotheses*).

Screw the fine print. Just grab an Optimizely account, set up two variations, and let ‘er rip! **

(**8 out of 10 A/B tests fail to generate a lift**).

And hey, with so many tools doing the work for you, it won’t take much time at all. ***

(***Contingent on pigs taking flight.***)

So what are you waiting for? A/B testing is for YOU!

Hold on just a second…

Does the following ring true?

You are:

• Tasked with marketing a SaaS company or ecommerce business

• Intrigued by the awesome potential of conversion optimization based on A/B testing

• Saddled with a million other tasks, and A/B is just another arrow in your quiver

• Wondering if changing your CTA button to orange will really generate that 20% lift

If that sounds accurate, I’m here to tell you that A/B testing probably isn’t the best approach for you.

But don’t worry! True to the principles of Breaking Bad News with Baby Animals, I’ve used kittens to soften this inconvenient truth.

What’s wrong with A/B testing?

When you have the time, skills, tools and inclination to do it right, nothing beats A/B testing.

When done right, it’s the most bottom-line driven marketing practice you can perform—to our knowledge.

Unfortunately, marketers are stretched to the limit, and most of us simply don’t have enough hours in the day to do it right.

We don’t have dedicated optimization teams, which leads to a number of problems…

Problem #1: Type I errors

h9ou8

Given: Optimization is the process of trying to get the most value from your traffic.

Type I Error: For any given testing method (T) there is an implicit or explicit likelihood (α) that any experiment found to have generated a positive effect on conversions was actually a winner due to random chance.

Translation: Your ‘winning’ variation may not actually be an improvement, and it may actually be worse.

Likelihood of Error: 5% (often disregarded by marketers)   [DEPENDS ON YOUR TEST SETUP, MOST PLATS USE 90 OR 95%]

Problem #2: Type II errors

hpxqd

Given: Optimization is the process of trying to get the most value from your traffic.

Type II Error: For same testing method (T) there is an implicit or explicit likelihood (β) that any experiment found to have no meaningful effect on conversions was actually a winner but for random chance.

Translation: You don’t want to miss out on something better.

Likelihood of Error: 20% (thoroughly disregarded by marketers).  [TYPICALLY 20% FOR ONLINE AB TESTING PLATFORMS]

Problem #3: The world changes, man

hpycx

When you conduct an A/B test and find a winner, the next step is to run an additional test that pits your winner against a new variation.

Naturally, this leads to seasonal changes in your traffic, since it doesn’t remain constant throughout the year.

And since many people reset/clear their cookies, you may not know how many repeats you have.

Problem #4: Inflexibility

hpyqh

If you fit the description earlier in this post, you’re probably not a seasoned pro at creating CRO hypotheses (few of us are).

So naturally, you may flip-flop on what should be included in your test variation.

But with A/B testing, introducing something new will screw up your test population, rendering your results invalid.

So once you start an A/B test, there’s no making adjustments on the fly.

Problem #5: Dirty traffic

This one is pretty simple.

A/B testing limits the number of tests you can run concurrently, as population samples for each test must be independent; there cannot be overlap.

This overlap means that, unless you have huge levels of traffic, it’s hard to run multiple tests simultaneously.

So what does all this mean for marketers?

These problems lead us to the conclusion that—for most marketers—there are only two options when it comes to A/B testing.

Option 1: Build up a skill set that helps you identify the 2 – 6 high-value experiments you will perform for the year. This requires a deep understanding of analysis techniques, statistics, usability, conversion centered design, and proper test structure. If you have the time for this, great, but most of us don’t.

Option 2: Get comfortable with the idea that if you run enough experiments, type 1 error is mitigated. We call this accelerating through mistakes.

If neither option sounds appealing, I don’t blame you.

What’s the solution? Increase your testing velocity by moving away from A/B testing methodologies, and towards continuous optimization.

The lowdown on continuous optimization

Continuous optimization is a testing framework that allows you to quickly identify whether your testing ideas are thumbs down, thumbs up, or just ‘meh’—all without invalidating your experiment data.

The engine behind continuous optimization is what’s called a multi-arm bandit algorithm.

A multi-arm bandit algorithm is a meta-framework for an experiment, and there are two major phases to it.

Concept 1: Exploration

During the exploration phase, multiple variations can be tested—not just two.

hq4fz

So yes, all those weird and wacky variations you came up with are fair game! Instead of just content, you’re free to test different different integrations, sizes, creative types, and targeting rules.

The more variations you add, the longer it will take to validate your winner. But it’s still much more time-efficient than testing variations one after the other.

Ultimately, the algorithm will figure out which idea works best for you in phase 2…

Concept 2: Exploitation

The exploitation phase is where the multi-arm bandit algorithm maximizes the expected reward from your winning variation.

It does this by automatically increasing the number of impressions the winning (or promising) variation is given, while decreasing impressions of losing variations.

Should your winning variation begin to falter once it takes centre stage, the algorithm will begin to decrease its impressions.

But wait, isn’t this the same as A/B testing?

Not quite. With A/B testing, these two concepts are performed one after the other.

That means that if you don’t have winning variations, your testing is likely having a negative impact on your business. The exposure to regret is high.

But with continuous optimization, concepts 1 and 2 are performed concurrently. Your potential winners are given increased exposure almost from the beginning, thus minimizing regret.

Wrapping it up

For time-stretched marketers who may not be well-versed in conversion optimization, continuous optimization (using a multi-arm bandit algorithm) offers the following advantages over A/B testing:

1. It minimizes regret by automatically increasing impressions of the best-performing variation.

2. It adapts to cyclical changes to the campaign and campaign environment.

3. It allows marketers to add new ideas/variations to the campaign without spoiling test data.

A multi-arm bandit algorithm should naturally gravitate towards your optimal creative, and make adjustments as your testing environment changes.

Coming up in our series…

In our next post on continuous optimization, we’ll pop the hood on multi-arm bandit algorithms and show the formulas that make this stuff work.

Yes, this will include math lessons from our CTO, Yosem Sweet…

yosemite

And yes, he labelled points on that x-axis as “Seems Good” and “Dunno.” Confused yet?

Stay tuned for updates!

roosterblog

Posts

No Comments

Be the first to start the conversation.

Leave a Reply

Text formatting is available via select HTML. <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

*