An Analysis Of Bias Or Why A/B Testing Fails - Issue 172

A recap of Stanford and Airbnb's collaborative paper on experimentation setup and analysis in two-sided platforms and marketplaces.

Nov 15, 2023

∙ Paid

A few years ago, Stanford collaborated with Airbnb on a fascinating research study - Experimental Design in Two-Sided Platforms: An Analysis of Bias. Last month, this paper was recognized with the Dantzig Dissertation Award and the MSOM Service Management SIG Best Paper Award.

It’s a fascinating study that should be close to any data scientist’s heart who works with A/B testing and who has, at some point, given up explaining why rapid iteration may not work and why A/B testing often fails.

The study offers a new way of setting up A/B tests and a new perspective on experimentation. Today, I am sharing a high-level recap of this research and integrating its theory and findings into our daily work. I’ll review the typical A/B test setups in marketplaces, how they fall short, and what methods the new research offers to address the bias.

I also want to emphasize how crucial it is for analysts to understand the complexity of interference, how dangerous bias is for the test analysis, and how we can continue to develop ways to navigate communication and uncertainty.

A/B testing is one of the methods of proving causation. If you want to prove A is causing B, you need to do two things: A/B test it, and then A/B test it right.

A recap from How To Prove Causation:

Causal inference methods include hypothesis testing (experimentation) and observations (user research). Keep in mind that experimentation (A/B tests, multivariate tests, split tests, etc.) and user research (surveys, field studies, interviews, etc.) don’t guarantee you clean causality; it’s subject to randomization, significance, confidence levels, treatment group size, setup, and more.

If you have been reading my newsletter, you should know that the success of your test depends on:

Instrumentation
Data maturity
The process should be tailored to reflect the nature of your product and business.

This academic study reiterates my points about the importance of test instrumentation. The test design should appropriately address your unique product offering, customer, and market. If you can’t validate or prove the test setup, or if you question the ability of your instrumentation to allocate users accurately, your test analysis might be wrong and lead to incorrect assessments.

Another reason why this particular research stands out is that Ramesh Johari himself led it. Ramesh Johari is a famous scientist, researcher, and professor who spent almost 20 years learning experimentation and has been advising Optimizely, Stitch Fix, Upwork, Airbnb, Uber, Bumble, Stripe, and more.

💡 Make sure to listen to his recent interview in
Lenny's Newsletter
- Marketplace lessons from Uber, Airbnb, Bumble, and more:

“Many of the changes that are most consequential create winners and losers. And rolling with those changes is about recognizing whether the winners you've created are more important to your business than the losers you've created in the process.“

What this research is about:

This 56-page academic study offers a framework to better understand how to set up and analyze experiments in two-sided marketplaces - products that offer interactions between different user groups (like buyers and sellers, readers and writers, listings and customers). The study is relevant to any platform with a “dynamic inventory model” - bookings, services, content, or tutorials, such as Upwork, Udemy, Airbnb, Substack, Uber, eBay, Etsy, Medium, Twitter, Instagram, Meta, and more.

The challenge: there is no easy way to eliminate interference.

This study focuses on the interference effect, where a treatment applied to one group of users unintentionally impacts another group. This phenomenon leads to bias and incorrect estimations, preventing an accurate test read.

How tests are run today in marketplaces.

Keep reading with a 7-day free trial

Subscribe to Data Analysis Journal to keep reading this post and get 7 days of free access to the full post archives.

Data Analysis Journal