✍️ Steps for conducting a product experiment:
Formulate Hypothesis - why did you have to run the experiment, what is the ROI?
Set Control and Variance:
know your conversion rate (baseline)
set the rate you expect - this is your Minimum Detectable Effect (MDE)
Calculate sample size:
set your significance, significant interval, and power
your group experiment sizes should be the same
your sample should be randomly distributed. Recognize traffic, device, returning users, etc.
Run the test until you reach a full sample. Not until you reach the significance.
Evaluate results and deploy the winning variation (if there is any).
🔥 Things to remember:
Run the A/A test first. It helps you to check the software, outside factors, and natural variance.
Don't run the experiment for too long, you might experience data pollution - the effect when multiple devices, cookies, outside factors affect your result.
Don’t run the experiment for too little time either, you might get a false positive (regression to the mean). In other words, when a variable is extreme at first but then moves closer to the average.
Look for the balance between data sensitivity and robustness - when the change in your result is visible, but it doesn’t fluctuate much when other events are occurring.
💣 Statistical terminology
P-value - assuming Null-H is true, what is the probability of seeing a specific result? If data is on the "not expected" region, we reject Null-H.
Statistical Significance is a probability of seeing the effect when none exists.
Statistical Power is a probability of seeing the effect when it does exist.
Confidence Interval is the number of allowed errors or measurement of estimate reliability: the smaller CI - the more accurate result.
z-score is the number of Standard Deviations from the mean.
🤔 If you are lost in conversions and numbers, check this guide:
If your baseline conversion is 20%, you may set the MDE to 10%, and the test may detect 18% - 22% conversion results.
The higher your baseline conversion, the smaller the sample size you need.
The smaller the MDE, the larger sample you need
Low p values are good. They indicate the result didn’t occur by chance.
Your significance level could be 95% and standard significant power - 80%
It’s often recommended to run the experiment for 2 business cycles (2-4 weeks)
📢 Use this calculator to determine the needed sample size for your experiment.
📢 Use this calculator to evaluate your test significance and result.
🔍 Other types of product testing
Multivariate testing (MVT) - multiple variants and their combinations within the single test.
Split URL testing - multiple versions of your webpage posted on different URLs.
Check out this guide if you want an A/B experiment checklist.