Data Analysis Journal

Data Analysis Journal

Machine Learning

Statistics 101: When Simple Becomes Tricky - Issue 157

How not to get trapped in descriptive statistics during data science interviews

Olga Berezovsky's avatar
Olga Berezovsky
Aug 16, 2023
∙ Paid

Did you know that to get the median value of an array, you must first sort the array? 

Now you do! 

You can’t calculate the median of an unsorted dataset (but you can get the mean and the mode). Welcome to another “let me remind you of some basics” data science and analysis newsletter.

We are so accustomed to built-in functions in SQL and Python. I don’t even remember what variance means anymore. I just type VAR() and don’t overthink it. A few weeks ago, I was asked to help replicate STDDEV() step by step and let me tell you, I had to take a good minute to collect my thoughts. 

Very often, you can be asked to break down common built-in functions during technical interviews, e.g. imagine there is no mode() function - write code to get the mode (rumors say the exercises below were asked at Reddit, GitHub, Amazon, and YouTube data science interviews). They are not meant to be difficult but instead aim to test your understanding of statistics, and if you’re able to recognize when and which function to apply.

There is no excuse for failing to replicate the mean or median.  

I assume everyone is familiar with MEAN(), MIN(), and MAX(). I’ll focus below on the basic must-know functions, which you definitely have to use for A/B test analysis, time series analysis, probability estimations, and predictions, such as variance, standard deviation, mode, median, and distribution.

User's avatar

Continue reading this post for free, courtesy of Olga Berezovsky.

Or purchase a paid subscription.
© 2026 Olga Berezovsky · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture