Handling Missing Data: Should You Drop or Impute? - Issue 219
Exploratory Data Analysis: Techniques for handling NULL values in modeling and analysis
Here’s what I’ve noticed about analysts:
Beginners often ignore missing values, even when the percentage is impactful. (It’s actually funny - they might report that 86% of the data is missing and then proceed with the analysis or modeling as if nothing is wrong).
Senior data scientists tend to rush to impute missing values with averages, regardless of whether it’s necessary.
There are guidelines for when it’s acceptable to ignore missing data and when it’s not. Obviously, this depends on the volume of missing values. However, it also depends on their distribution, significance, variance, dependence, and the type of modeling you are performing.
Today, I want to recap the most common and, in my opinion, underrated issue in analytics: handling missing values.
Keep reading with a 7-day free trial
Subscribe to Data Analysis Journal to keep reading this post and get 7 days of free access to the full post archives.