Lab 5: 📱 A/A testing of a mobile application

Probability and Statistics

Input

Your company has a smartphone app that allows users to make purchases. To conduct A/B testing, you use your platform that can divide users into groups and collect data about their behavior using a split system. To verify that the split system is working correctly, you conduct A/A testing: divide users into two groups, but do not change anything about their experience with the application.

What is A/A testing?

A/A testing is an experiment in which two groups of users get the same experience of interacting with a product. This allows you to check the split system’s correctness and ensure that it divides users into groups correctly. That is, it is expected that there will be no differences between the groups.

Task

Important

We expect all the results to be interpreted in the task context.

It is necessary to calculate the results of the A/A test, checking the FPR quality metric (we will check for conversion to purchase):

\[ \text{Conversion rate} = \frac{\text{Number of purchases}}{\text{Number of users}} \]

It is known that the split system is broken.

You need to check the claim of a breakdown and find its causes if the split system is really broken.

Data

Data set you can find here: link

Variables in the dataset:

uid: user identifier
experimentVariant: experiment variant
version: version of the application
purchase: the fact of purchase

How to check the breakdown

Run the A/A test.
Calculate the FPR at \(\alpha\) = 0.05 (set up subsamples with no return volume of 1000). You will see that FPR > \(\alpha\)! We need the opposite — to be less.
Find the reasons for the split system failure based on the experiment’s results (hint: find the anomaly in the application version).
Write conclusions that can be drawn based on the analysis of the A/A test results.

Note

Calculate the conversion rate (purchase rate), grouping by the variant and version of the application. It may already become clear where precisely the causes of the breakdown are.
Calculate the \(p\)-value for each version of the application.
Find versions where \(p\)-value < 0.05.
Exclude the version with \(p\)-value < 0.05 from the main data table
Rerun the FPR calculation through A/A. Now, FPR should be less than \(\alpha\).
You find the cause of the breakdown.