How To Prove Causation - Issue 148
Using causal inference and regression analysis for informed decision-making
Today’s issue is the final publication in my regression series and probably one of the most important topics in analytics: proving causation.
Over the last two weeks, I shared an introduction to correlation and regressions, where I covered:
How to do linear regression and correlation analysis - what a regression analysis is, when and how to run it, how linear regression is different from correlation analysis, and use cases for each.
Decoding Regression Scores - what the different types of regression are, when to run which regression method, and how to read regression scores and an equation.
You already know that correlation does not prove causation. Regression analysis exists in a gray area between cause and effect.
I was taught back at school, that regression may be used to attempt to estimate causal relationships from observational data. And you will find quite a few examples shared by statisticians and researchers where linear regression is used for causal inference.
And yet, if you take an economic theory or MBA, you will learn that regression doesn’t prove cause-effect, and the relationship you see on the linear regression graph does not imply causation.
🤔 So wait, does regression prove or not prove causation?
I like Paul Allison's, Ph.D., professor of statistical methods, take on this that “regression can be used for both causal inference and prediction“ but it all comes down to “how the methodology is used”, or if it should be used at all for a particular problem or question.
Bringing this into applied data science and analytics, any type of regression (like any type of ML) on its own doesn’t prove causation. However, it can be used to get clues and higher confidence that there’s a strong connection between variables, and how this connection will change if we increase or decrease the input values. That being said, if used wrong, and error scores are ignored, it can improperly guide you further away from the truth by misrepresenting data and failing to return the true relationship pattern.
In this publication, I’ll share some examples of causation analysis and show you ways how not to get tricked by flawed regression output, and how to recognize when the regression pattern you see is correct, trusted, and causal.
Keep reading with a 7-day free trial
Subscribe to Data Analysis Journal to keep reading this post and get 7 days of free access to the full post archives.