Data Analysis Journal

Data Analysis Journal

Machine Learning

Handling Missing Data: Should You Drop or Impute? - Issue 219

Exploratory Data Analysis: Techniques for handling NULL values in modeling and analysis

Olga Berezovsky's avatar
Olga Berezovsky
Aug 28, 2024
∙ Paid

Here’s what I’ve noticed about analysts:

Beginners often ignore missing values, even when the percentage is impactful. (It’s actually funny - they might report that 86% of the data is missing and then proceed with the analysis or modeling as if nothing is wrong).

Senior data scientists tend to rush to impute missing values with averages, regardless of whether it’s necessary.

There are guidelines for when it’s acceptable to ignore missing data and when it’s not. Obviously, this depends on the volume of missing values. However, it also depends on their distribution, significance, variance, dependence, and the type of modeling you are performing.

Today, I want to recap the most common and, in my opinion, underrated issue in analytics: handling missing values.

Keep reading with a 7-day free trial

Subscribe to Data Analysis Journal to keep reading this post and get 7 days of free access to the full post archives.

Already a paid subscriber? Sign in
© 2025 Olga Berezovsky · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture