6. Results

Once your experiment has been running for a while, you’ll start seeing data collecting on the Results page as your users see variants. Here is a quick guide on how to:

Interpret Results

Below is a sample of results on the Overview tab. At a glance, you can see the number of participants, and how each variant performed for each goal.

Screenshot of Results Overview showing participants for each variant and conversion rates for goals in the experiment

The top table shows the experiment population, participants and participation rate.

An experiment’s population is the count of all users who could participate in this experiment. An experiment’s participants is the number of users who actually participated in the experiment, that means they navigated to the part of the app where the experiment is and either saw the ‘original’ or one of the ‘variants’. The participation rate is the ratio of participation/population

Experiment results only consider the experiment participants, as they are the users in an experiment’s population who have actually seen one of the experiences being tested.

Results are organized as a series of goals. A goal is an event or series of events (called a funnel) that tell you about your users behavior. For example, Did they sign in? Did they add something to their cart and make a purchase? A great experiment will have goals that show the user behavior changes driven by the differences in experience between the experiment’s ‘original’ and the ‘variant’.

Adding a Goal

If you do not have any goals added to your experiment yet, add a goal by selecting the “Goals/Funnels” tab and clicking “Create Goal”. A goal is composed of events, either single event or a series of events combined into a funnel. An event can be an Apptimize event setup using our visual editor or API methods, or an imported event from other analytic platforms like Goggle Analytics, Amplitude, Flurry, Localytics, Mixpanel and Omniture.

modal for adding a goal to the results page by selecting available funnels, apptimize events, or 3rd party events

Results Details

In the cell at the intersection of each variant and goal, we are showing four data points: lift, impact, trajectory, and significance.

  • The first number is the lift which is a percentage that shows how much higher or lower the variant is compared to the original for that specific goal.
  • The second line of numbers shows the impact of the variant compared to the original for that goal. In other words, it’s the raw change in conversion rate from the original to the variant.
  • The color of the box shows the trajectory. Green indicates the variant performed higher than the original for that goal, and red means lower.
  • Lastly, the darkness of the color signifies how statistically significant that difference is based on a binomial distribution. A white box means the statistical significance is below 80%. A light green or light red means the significance is between 80% and 95%. A dark green or dark red means the difference between the variant and the original is statistically significant at the 95% confidence level.

Image showing the colors that signify statistical significances

Note that the performance of the original variant, or control, for each goal is not compared to anything by default. Each variant is compared to the original variant.

If you click on any goal on the overall tab or select a goal in the “Goals/Funnels” tab, you’ll see a table, like the table below, showing the performance of all variants for each goal showing lift, statistical significance, goal conversion rate, number of participants, number of participants that converted and total count of times the goal was triggered.

Screenshot showing results by goal as a table

On the right, there is a drop down menu to view your goal results as either a table, bar graph or time series line graph. The bar graph below shows you a comparison of the variants’ performance for the goal you selected. The lines that extend above and below the bar graph are error bars that represent the confidence interval of the average conversion rate. The graph can show conversion rate, conversion count and lift.

Screenshot showing results by goal as a bar graph

Time series data on the line chart for the performance according to that goal is also available. Click on the drop down menu to select “line chart” from the options of “table”, “bar graph”, “line chart” shown in the image below. You can adjust the time range on the time series to see a range of days while the experiment was running.

Screenshot showing results by goal as a time series line chart

A glossary of terms are defined here.

Analyze Funnels

If you select a funnel as a goal, the results dashboard will display conversion data for each step of the funnel.

Screen Shot 2014-10-14 at 4.26.03 PM

Above the two tables, you will see the chain of events for this funnel. The first table shows the results for the entire funnel, from beginning to end (Login to Purchase in this example). The second table shows the conversion rate of each step in the funnel chain. In this example, you can see that in the “Bigger Buttons” variant, users who have added items to their cart are more likely to make purchases.

Segmentation and Filtering

Apptimize results allow you to dive deeper into the data to understand more of what’s really going on. You can filter out factors that you know to be biasing your results or segment the data according to cohorts to see the differences in performance. Here is an example of how segmentation helped one customer catch a device specific bug.

You can segment and filter based on one of the default attributes:

  • device model,
  • screen height,
  • screen width,
  • screen scale,
  • country,
  • OS version,
  • app version,
  • language.

Or you can set custom attributes based on anything you know about your customers. Contact us to learn more about how to do this with your specific infrastructure.

Engagement and Retention

Apptimize tracks engagement and retention automatically. Without ANY action on the your part, Apptimize tracks the:

  • average number of sessions,
  • average number of screens viewed per session,
  • average time interval between sessions,
  • average session length,
  • 1-day retention,
  • 7-day retention

of all your variants. Simply click the “Engagement” tab to see these results.

To see engagement and retention results, your app has to be running iOS SDK v2.7+ or Android SDK v2.4+.

Example results showing screen-shots viewed from engagement results

A session is when a user is in your app, specifically when your app is foregrounded to when the app is backgrounded. We measure the average session time per devices – how long the user had your app open for, as well as the average interval time between sessions.

The number of screens viewed per session does vary per platform. On Android this is the number of activities, popups, and fragments we measure during a session. On iOS it is the number of view controllers shown that we measure per session. Note that since this is how we measure screens view, it’s possible given how differently apps are architectured that we overcount the number of screens viewed, however since in an experiment you’re measuring versus a control this doesn’t adversely affect your experiment results.

Retention looks at how many users have returned to your app in 1-day or 7-days. On your your time series chart, we show retention for the start day. So if you are looking at 1-day retention for Sept 1st, then you are seeing how many users had at least a second session between the period of Sept 1st and Sept 2nd. Similarly for 7-day retention, if you are looking at Sept 1st, you are seeing how many users had at least a second session between the period of Sept 1st and Sept 7th.

Example results showing 1-day retention results

When should I stop an experiment?

The answer to this question depends on how certain you wish to be that a variant is actually better than your current app version before pushing that variant out to all your users. 95% statistical significance is the scientific standard. This means that there is a 95% chance that the actual conversion rate for your variant is better than the actual conversion rate for your control. This is the same level of certainty the FDA requires for clinical trials of drugs. However, you might not require this level of certainty for every test. If a test has a low risk of greatly affecting your bottom line or if time is a pressing factor, you might conclude the test after reaching 90% or 80% statistical significance. Ultimately, it depends on how confident you need to be.

I keep waiting but my results are still not statistically significant. What should I do?

You basically have three options: 1) wait, 2) restructure your experiment, or 3) give up. If your experiment has been running for only a few days and has been gaining in statistical significance, it could be worthwhile to wait a little longer. Otherwise, you could consider restructuring your experiment to increase the percentage of users who see the experiment and/or reduce the number of variants. In the Targeting tab for the experiment, you can click the “calculate best allocation” tool to play with experiment restructuring of allocation, number of variants and desired statistical significance. If you try both of these tactics to no avail, there might actually be no real difference between your variants. In this case, it’s ok to stop and say that they are likely the same.

Example results showing 1-day retention results

A good rule of thumb is to always look at the confidence intervals of each variant. If the confidence intervals are really large, then there is a chance that running the experiment longer can produce statistical significance. But if the confidence intervals are really small already (in other words that we’re pretty certain about the accuracy of the conversion rates), and the conversion rates are just really close to each other, there might be no difference between your variants.

How do I show a winning variant to all my users?

To stop an experiment and choose the winning variant, navigate to the Launchpad page. Scroll to the bottom of the page and click on “Stop Experiment”. A modal will appear that displays results for your primary goal. Select the variant you’d like to show to users. Selecting the original variant will stop pushing any changes to users and default to the behavior in your app’s code.

Screenshot showing an example Select a Winner popup with options to select original variant or variants Checkout flow A, B or C and the conversion values for the primary goal

Once you’ve selected a variant, you’ll see that it’s now being shown to all users. Note: Showing variant to all users will respect the experiment targeting. For example, if your experiment targets only users in United States, then the variant you choose will only be shown to all users who also match the criteria whose country are the United States. You can edit the targeting on the Targeting tab. To change which variant is shown to all users – click “Change Variant”, or revert to the original – “Show original to users” on the Launchpad.

Screenshot showing how to change from showing winning variant to another variant or original