**✍️ **Steps for conducting a product experiment:

**Formulate Hypothesis**- why do you have to run the experiment, what is the ROI?**Set Control and Variance**:know your conversion rate (baseline)

set the rate you expect - this is your Minimum Detectable Effect (MDE)

Calculate

**sample size:**set your significance, significant interval, and power

your group experiment sizes should be the same

your sample should be randomly distributed. Recognize traffic, device, returning users, etc.

**Run the test**until you reach significance.**Evaluate**results and deploy the winning variation (if there is any).

### 🔥 Things to remember:

Run the A/A test first. It helps you to check the software, outside factors, and natural variance.

Don't run the experiment for too long, you might experience data pollution - the effect when multiple devices, cookies, outside factors affect your result.

Don’t run the experiment for too little time either, you might get a false positive (regression to the mean). In other words, when a variable is extreme at first but then moves closer to the average.

Look for the balance between data sensitivity and robustness - when the change in your result is visible, but it doesn’t fluctuate much when other events are occurring.

### 💣 Statistical terminology

To approach A/B testing, you can think of **Null-Hypothesis** testing and apply the following terms:

P-value - assuming Null-H is true, what is the probability of seeing a specific result? If data is on the "not expected" region, we reject Null-H.

Statistical Significance is a probability of seeing the effect when none exists.

Statistical Power is a probability of seeing the effect when it does exist.

Confidence Interval is the number of allowed errors or measurement of estimate reliability: the smaller CI - the more accurate result.

z-score is the number of Standard Deviations from the mean.

### 🤔 If you are lost in conversions and numbers, check this guide:

If your baseline conversion is 20%, you may set the MDE to 10%, and the test may detect 18% - 22% conversion results.

The higher your baseline conversion, the smaller the sample size you need.

The smaller the MDE, the larger sample you need

Low p values are good. They indicate the result didn’t occur by chance.

Your significance level could be 95% and standard significant power - 80%

It’s often recommended to run the experiment for 2 business cycles (2-4 weeks)

📢 Use this calculator to determine the needed sample size for your experiment.

📢 Use this calculator to evaluate your test significance and result.

### 🔍 Other types of product testing

Multivariate testing (MVT) - multiple variants and their combinations within the single test.

Split URL testing - multiple versions of your webpage posted on different URLs.

Multipage testing - testing changes across different pages. There are Funnel Multi-Page testing and Conventional Multi-Page testing. Read more here.

Check out this guide if you want an A/B experiment checklist.