A/B Testing Glossary
A/B testing helps product managers to rapidly grow and optimize digital experiences. Our A/B testing glossary breaks down commonly used terms and concepts used when A/B testing across different channels – mobile, web, server and OTT.
Multivariate (MVT) testing is a technique that modifies multiple variables at once, to determine the highest-performing combination of multiple elements on a single page. Compare this to traditional A/B testing where you’re only changing one variable.
Let’s say you have a promotional banner on your website that says “Save 10% now!” in 36pt font. Perhaps you want to see if changing the font size or the copy on the banner would get more users to click on the promotion. With a straightforward A/B test, you might just change the font size or just change the copy. But what if there is some interaction effect such that only a very large “Save $20 now!” message drive sales but a small “Save $20 now!” or a large “Save 10% now!” doesn’t work as well. This is where multivariate testing would come in.
Multivariate testing will help you understand interaction effects between changing two completely different things.
In traditional A/B testing, you would simply create one variant for each version of the element you want to change. If you wanted to test the color of a button, you would create one variant for each button color you want to test. This means that the more colors you want to test, the number of different variants increases linear. You want to test 5 colors? You’ll have 5 variants.
In multivariate testing, this gets more complicated because you need to account for the combination of changes. Let’s say we want to test the color and the copy of a button. If we have 3 colors and 2 different lines of copy, we’ll end up with 6 variants:
Color 1 + Copy 1
Color 1 + Copy 2
Color 2 + Copy 1
Color 2 + Copy 2
Color 3 + Copy 1
Color 3 + Copy 2
If you want to test 3 different elements (let’s say color, copy, and font size), it gets even more complicated:
Color 1 + Copy 1 + big font
Color 1 + Copy 1 + small font
Color 1 + Copy 2 + big font
Color 1 + Copy 2 + small font
Color 2 + Copy 1 + big font
Color 2 + Copy 1 + small font
Color 2 + Copy 2 + big font
Color 2 + Copy 2 + small font
Color 3 + Copy 1 + big font
Color 3 + Copy 1 + small font
Color 3 + Copy 2 + big font
Color 3 + Copy 2 + small font
The number of variants you need to have can be found with this formula:
[# of versions of element 1] x [# of versions of element 2] x [# of versions of element 3] x … = total # of variants
As you can see, an addition of another element into the mix dramatically increases the number of variants and divides your user traffic even more. This makes is harder and harder to get statistical significance.
The most common reason product teams use multivariate tests rather than A/B tests is when they want to test the interaction effects between elements. If you simply want to test more than one element, you should run multiple A/B tests instead.
For example, if you want to test the color of a buy button and you want to test the payment options available to the customer, those two things probably won’t affect each other. In this case you’re better off running two separate A/B tests because:
As a simple example of this process in action, let’s take our testing case study of German app Clever Lotto, a popular lottery game in Europe. The Clever Lotto team wanted to increase the number of rounds their users played. In short, they wanted their users to stay online longer. With just the tap of a button, players could advance to the next drawing—but could a design change of the button lead to more people continuing to play?
The team ran a multivariate test on two font colors (red and green) and two calls to action (the gentle “Play again now” and the more intense “PLAY AGAIN!”). The hypothesis: red letters would be the game changer… The results? Users didn’t prefer green over red, the intensity of the wording won out overall: PLAY AGAIN! We found that green had a 72.40% chance of causing a 5.95% lift over red, while “PLAY AGAIN!” had a 99.85% chance of causing a 25.38% lift over “Play again now.” This meant that the color did not have a statistically significant impact, while the copy change absolutely did.
Since multivariate tests split users into many more groups than single variate A/B testing, you’ll need to have a large volume of users to even justify the process. Sample sizes and run times have a significant impact on results from any kind of testing. The larger the sample size taken from your user pool, the more accurate the results will be. If your variants are small and do not yield a big difference in user behavior, you will require a larger amount of traffic for statistically significant results.
Multivariate testing requires detailed preparation in order to get the most meaningful results possible. As stated above, do not use multivariate testing if there is no logical reason why 2 or more elements should affect each other.
It can also be a good idea to run a classic A/A test to confirm that the software and your product are testing accurately. If all goes well, an A/A test will come back inconclusive—because both variations are the same—and you can move on to more interesting experiments.
Success is often unique to each team, product, and test. However, it’s imperative that you have a hypothesis of how the multivariate test will affect the metric that you’re measuring before you began. Multivariate testing should not be used just to see what happens.
Mostly, we see teams test user acquisition, conversion rates, retention/engagement, and monetization.
Another factor that influences the results of multivariate testing is how long they run, as well as the size of the sample that will be analyzed. There are dependable formulas that testers can use for multivariate testing that can give relatively precise estimates of the sample size that would be needed, which would also determine how long those tests need to be run. That is the best guideline to achieve as precise results as possible based on multivariate testing.