Data Analysis Journal

Data Analysis Journal

Machine Learning

Outliers: To Drop or Not to Drop? - Issue 196

From Analytics to ML: How to detect outliers and what to do with them.

Olga Berezovsky's avatar
Olga Berezovsky
Apr 10, 2024
∙ Paid

There is a common misconception that outliers are bad. They skew the distribution, so we should detect and remove them early to proceed with modeling or analysis.

Here is what typically data scientists do when working on ML: 

  1. Check null values. If they are sparse, remove them. If too many values are missing, find a way to fill them in.

  2. Create a distribution of values. Locate outliers. Remove outliers.

  3. Convert categorical values into numerical ones for modeling.

  4. Group values into features. The more user attributes the dataset has, the better the model performs.

  5. The dataset is now clean - there are no outliers, null values, or numerical data - and ready for modeling.

Each of the steps above may be flawed. Some are easier to troubleshoot and improve, while others are more complex and require more context. 

Today, I will focus on outliers.

Outliers are not necessarily bad and do not always have to be removed. It depends on their use case: 

  • Certain ML models handle outliers quite well, while others will degrade in performance. 

  • While some KPIs and metrics, like DAU, ARR, or Churn, remain unaffected by outliers, others can become misleading, such as Time-to-Value, Transactions Per User, Average actions, etc.

Below, I will discuss the different types of outliers, show how to detect them, and how to figure out when you should remove, keep, or adjust them. Why, in some cases, outliers are harmful, and in others, you have to keep them in your dataset to make your analysis or model more precise and accurate.

Techniques to detect outliers

User's avatar

Continue reading this post for free, courtesy of Olga Berezovsky.

Or purchase a paid subscription.
© 2026 Olga Berezovsky · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture