Why You Shouldn’t Stop A/B Tests Early - Issue 193
Why significance matters, and how long you should run an A/B test
Welcome to the Data Analysis Journal, a weekly newsletter about data science and analytics.
Today, I wanted to cover the most commonly asked questions on experimentation in analytics:
How long should you run an A/B test? It’s recommended for 2 weeks, but why?
Can you (or should you) stop an A/B test early?
If you have to, what is the safest approach to handling fast A/B tests?
Why are slow rollouts dangerous?
What is the recommended procedure for gradually launching A/B tests over time?
There's a wealth of information on A/B testing available, ranging from academic papers on the frequentist approach (which is often less relevant for marketing or product analytics) to complex probability theories. Today, many product and analytics leaders may not have a background in statistics. Their knowledge of A/B testing often comes from online courses and self-study. As a result, they might not grasp the differences between A/B testing and Hypothesis testing or A/B testing and Split testing. This can lead to applying the same strategies to each or, even worse, having the same expectations for data trust in their results.
“A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule.” Dr. Rob Balon.
While I can't reinvent statistical methods to make Bayesian analysis fit the Frequentist framework, preventing stakeholders from being offended when analysts admit to having low trust in data (breaking news: most winning A/B Test results are illusory), I can at least re-iterate and clarify the basics here.
Why significance matters
Because early experimental data is more likely to be wrong.
You shouldn’t stop the test early - even if you think you see a clear winner - because of regression to the mean.
Regression to the mean (RTM) theory describes the false-positive result. It’s an effect when a variable is extreme at first but then moves closer to the average. In real life, the RTM conversion looks approximately like this:
Keep reading with a 7-day free trial
Subscribe to Data Analysis Journal to keep reading this post and get 7 days of free access to the full post archives.