The Mobile A/B Testing Term Handbook

bible
As mobile apps become increasingly competitive, it’s more important than ever to be optimizing your own to stand out from the crowd. A/B testing can be a powerful tool, and a competitive advantage when done correctly. However, it can also get a little tricky, so we’ve created a handbook of essential terms to help guide you through.

Our collection has been carefully curated to only provide you with vocabulary critical to creating and understanding your experiments, guarding you against poorly run tests, and misleading results. No fluff included.

A/A Testing

A randomized experiment without any variations (i.e. two control variants A and A), that helps testers determine the causes of false positives, and calibrate results.

A/B Testing

A randomized experiment with two variants, A (control) and B (treatment), that compares the difference in their performances.

Allocation

An amount or portion of a resource assigned to a particular recipient, that sets the size of an experiment’s population .

Bias

The component of results generated by differences between the test and control group that are not related to the test itself.

For example, if your test group is made up of frequent users while your control group is sampled from all users, the results would have bias and be invalid. This is because you’ll never know if the results are due to what you’re experimenting with, or because there is an underlying difference between the groups.

Control

A variant of an experiment that is the same as the original app. It’s important to note that in mobile A/B testing, the control and unallocated users are NOT the same. Control groups are structured just like test groups, except for the fact that they do not contain the change being tested for. They must be of the same size, and receive all the same data.

confidence

Confidence Interval

The range of values that we can be reasonable sure that the true value lies within. Note that we can never be 100% sure of this interval due to bias and noise .

Confidence Level*

A measure of how sure one can be that results fall into the range of values in a confidence interval. It represents how often the true percentage of a population would generate results that lie within the confidence interval.

*A common mistake is to think that having a X amount lift with a 95% confidence level means that “I am 95% sure that there is an increase of X amount” Rather, it means that you are 95% sure that the difference lies within the confidence interval.

Conversions

A conversion is an action that a person takes in your app such as checking out, registering, adding an item to the shopping cart, or viewing a particular page. Note that conversions are binary actions (they are either triggered or not).

Conversion Rate

The number of conversions divided by the total number of participants.

Conversion Rate = (Number of Converted Participants) / (Total Number of Participants)

If I was tracking number of purchases for a conversion goal, I would look at the number of users who completed transactions, then divide that by the total number of users in my experiment. If 10,000 users viewed my variant, and 500 customers made purchases, my conversion rate would be calculated as follows:

Conversion Rate = 500 / 10,000 = 0.05 or 5%

Daily Active Users (DAUs)

The number of unique users who have logged in or performed some other specific action within a 24 hour period. See also MAUs.

Dynamic Variables

Apptimize dynamic variables are programming variables that have been defined in your code, whose value can be changed from our server. At the time of definition, a dynamic variable is assigned a default value, and its value can be changed via the Apptimize server by running experiments or instant updates.

Event

User interactions with content that can be tracked independently from a page or a screen load, such as a mouse clicks, purchase, page loads, downloads, flash elements, and video plays.

Any user action that can be tracked and used to measure the behavior of users. Events can be pure conversions such as user registered, or they can take more complex forms such as purchases with an attached amount.

Goal

An event or combination of events that measure the success of an experiment or A/B test.

Hybrid Test

A test that affects elements of the app which are based off server generated data. In Apptimize hybrid tests, dynamic variables can change the server/JSON request to ultimately obtain different types of content while they’re generated from a server.

Impact

The difference between performance expressed as an absolute difference between the two
If the conversion of the control is 10%, and the conversion of the variant is 15%, the impact is 5% while the lift is 50%.

(Variant Conversion 15%) – (Control Conversion 10%) / (Control Conversion 10%)

Instant Update

Instant Update is a feature of Apptimize that allows you to make any visual change to your app instantly, without the need to submit to the App or Play Store.

Key Performance Indicator (KPI)

A quantifiable measure that experiments uses to gauge or compare performance in terms of meeting their strategic and operational goals. While it’s useful to measure a variety of goals, KPIs are typically the goals that define the ultimate success of an experiment.

Launch

Taking an experiment live so that users will start participating.

Lift

The difference in performance between a specific variant and control, expressed as a percentage difference between the two. See also impact.

Monthly Active Users (MAUs)

The number of unique users who have engaged with an app within a 1 month period.

Metric

See goal.

multivariate-testing

Multivariate Testing (MVT)

A type of experiment that tests combinations of multiple variables to see which creates the best possible outcome. For example, let’s say you wanted to test different combinations of color and copy for login CTAs to see which one performed best. You might test out 2 different colors (red and green), as well as 2 different types of copy (“Sign In” and “Join Now”).

(2 types of color) X (2 types of copy) = 4 combinations

Multivariate testing would allow you to run a single experiment to test all the possible combinations at one time. Since there are 2 colors and 2 types of copy, you would have a total of 4 variants (combinations). After a test is run, you can see which of the 4 variants performed best for your specified KPIs.

noise

Noise

A fluctuation in results due exclusively to random chance. For example, the results of an a/a test are defined as pure noise.

Participant

A user in an experiment’s population who has witnessed a variant that they’ve been assigned to.

Population

All users in an experiment who have been assigned to a variant, but may not necessarily have seen it yet.

For example, if you’re testing a new checkout page, users allocated to a variant may join the population at login time, but will not become participants unless they have witnessed the checkout page.

Note that only participants are relevant for the purpose of measuring results of a test.

Retention

The percentage of users who after completing a session, start a new session within a specified period of time. Retention is often one of the most important KPIs for apps.

These can include metrics such as customers who still use the app after 30 days, the number of users who share a post after a few days, or customers who make a purchase again within 1 year.

Selection Bias

A statistical error that results from improper randomization that can cause skewed results.

Session

A session is when a user is in your app, specifically measured by the time between when your app is foregrounded to when the app is backgrounded.

Session Interval

The session interval is time between consecutive sessions, specifically the time between when the app is backgrounded to when it is next foregrounded by a user.

split test

Split Test

Another name for A/B Testing.

Statistical Significance

The probability that there is a true difference in performance between your variant and its control. Statistical significance allows us to understand when results are true, versus when they might be due to noise. In business, generally everything under 80% is considered not significant. Figures between 80 and 95% are directional or moderate, and anything above 95% is high.

Targeting

A set of conditions to define whether a specific user is eligible for an experiment.

Test Verification (Advanced Verify)

The process of checking of the correct setup of a test prior to its launch. This typically involves 3 components:

Checking variants to see if they are are behaving correctly
Ensuring correct targeting of an audience
Checking that success metrics are captured correctly.
Making sure that we’re able to capture enough data to clearly capture the winner

Variant

A change or set of changes whose effect we want to know and/or measure. A variant can only be defined by a set of changes against a baseline known as control.