Where product, data science, and analytics intersect. Trusted by tens of thousands of data scientists around the world

A/B One-Pager

Aug 14, 2020

Formulate Hypothesis - why do you have to run the experiment, what is the ROI?
Set Control and Variance:
1. know your conversion rate (baseline)
2. set the rate you expect - this is your Minimum Detectable Effect (MDE)
Calculate sample size:
1. set your significance, significant interval, and power
2. your group experiment sizes should be the same
3. your sample should be randomly distributed. Recognize traffic, device, returning users, etc.
Run the test until you reach significance.
Evaluate results and deploy the winning variation (if there is any).

Run the A/A test first. It helps you to check the software, outside factors, and natural variance.
Don't run the experiment for too long, you might experience data pollution - the effect when multiple devices, cookies, outside factors affect your result.
Don’t run the experiment for too little time either, you might get a false positive (regression to the mean). In other words, when a variable is extreme at first but then moves closer to the average.
Look for the balance between data sensitivity and robustness - when the change in your result is visible, but it doesn’t fluctuate much when other events are occurring.

To approach A/B testing, you can think of Null-Hypothesis testing and apply the following terms:

P-value - assuming Null-H is true, what is the probability of seeing a specific result? If data is on the "not expected" region, we reject Null-H.
Statistical Significance is a probability of seeing the effect when none exists.
Statistical Power is a probability of seeing the effect when it does exist.
Confidence Interval is the number of allowed errors or measurement of estimate reliability: the smaller CI - the more accurate result.
z-score is the number of Standard Deviations from the mean.

If your baseline conversion is 20%, you may set the MDE to 10%, and the test may detect 18% - 22% conversion results.
The higher your baseline conversion, the smaller the sample size you need.
The smaller the MDE, the larger sample you need
Low p values are good. They indicate the result didn’t occur by chance.
Your significance level could be 95% and standard significant power - 80%
It’s often recommended to run the experiment for 2 business cycles (2-4 weeks)

📢 Use this calculator to determine the needed sample size for your experiment.

📢 Use this calculator to evaluate your test significance and result.

Multivariate testing (MVT) - multiple variants and their combinations within the single test.
Split URL testing - multiple versions of your webpage posted on different URLs.
Multipage testing - testing changes across different pages. There are Funnel Multi-Page testing and Conventional Multi-Page testing. Read more here.

Check out this guide if you want an A/B experiment checklist.