Behold, That’s How You Handle Missing Data - Issue 7

It’s Wednesday, and I’m back with another edition of Data Analysis Journal for you to explore news, events, insights, and resources in the data analysis world. 

Today we will be discussing: 

  • What Data Science skills, tools, and education levels are the most required by companies who are hiring in 2020.

  • What are the proper ways of handling the missing data in your modeling or analysis.

  • Free resources and popular YouTube channels for learning data science and analytics.

  • Which Machine Learning certificate program to pick - AWS, Azure, or the new cool one from Google? 

  • An open internship opportunity for a Data Science role.

  • Join the Women’s Leadership Conference this week.

Let’s get started! 

📚 Weekend Longread

Check out this amazing in-depth study of job offers in the data science field conducted by Iliya Valchanov, co-founder of the data science learning platform 365datascience (this is one of my favorite learning resources which I covered in one of my previous issues). The study is based on 1,170 data scientist job descriptions in the USA in 2020. Learn the most asked for skills, education level, and required years of work experience. 

If you are busy, here is a quick recap: 

  • California posted the biggest number of open Data Science jobs this year.

  • Most job offers require at least a Bachelor’s degree, followed by a Master’s degree and Ph.D.

  • On average companies demand that candidates have at least 4.2 years of previous experience as a data scientist.

  • Python is the most required skill followed by R and SQL.

  • Spark, AWS, and Hadoop are the leaders for databases/cloud products.

Read more here.

🔥 What’s new this week

  • This is an exciting week for SQLite users. SQLite released many new cool features (query planner improvements, hashes, support for UPDATE FROM making it compatible with PostgreSQL). If you are new to SQLite, check my guide on what is SQLite and how to get started with it. 

  • There are just a few weeks left to sign up for a DAA course Applying Digital Analytics if you want to learn applications of web analytics and their processes, pitfalls to avoid, the steps to using analytics to improve site design, digital marketing, conversion/lead optimization, and improving return on marketing. 

  • MongoDB co-founder and former CTO Eliot Horowitz shared a piece on the creation of MongoDB and why it is “fundamentally better” for developers. They described why their database “pushed the industry” and how exactly MongoDB makes developers’ life easier. 

🏆 Nailed It

Be prepared for your next interview

Today I want to focus on handling missing data in your analysis or modeling. 

If you are dealing with missing values in your dataset, you have two options to proceed: ignore them or do data imputation (replace them). Which one to choose? The answer depends on the following questions:

  1. Are you working on multivariate analysis?

  2. How much data is missing? 

  3. Are values missing at random (randomly distributed)? 

    1. If yes, you can do a T-test for two data partitions: one with missing values and another one without missing values, and check the difference between 2 means

    2. If not, you are dealing with a more complicated case. Your data might be randomly distributed within one or more sub-samples. 

If your analysis is multivariate, and if there is a larger number of missing values, then it might be better to drop those rather than doing imputation. 

If your analysis is not multivariate, and the values are missing at random, imputation may decrease the amount of bias in the data. 

If the values are not missing at random, you need to do data computation. 

Also, check this guide to set up the approach for handling missing values in your modeling:

Taken from here

To read more about data computation strategy and techniques:

https://towardsdatascience.com/how-to-handle-missing-data-8646b18db0d4

https://towardsdatascience.com/all-about-missing-data-handling-b94b8b5d2184

🎓 Level Up

Certifications, internships, schools, and courses.

Free resources:

  • Learn data analysis and data science for free, assess your level, or train your team at Correlation One - a global AI and DS network which connects universities and companies around the world.   

  • Amazing opportunity for high school and college students! Girls in AI just released their new all-girls, free, high school program happening in the fall. This is a great opportunity to learn the ML and DS fundamentals with fun projects, keynote speakers, and exclusive events.

  • Check this YouTube channel from Machine Learning University with short videos describing and explaining ML, Computer Vision, NLP, and Tabular Data.

  • Another amazing YouTube channel from the Mumbai edition of the Global Stanford Women in Data Science Conference featuring women in the field of AI, ML, Analytics, and Statistics.

Certifications 

If you are looking to get an ML certificate but not sure which platform to pick AWS or Azure, check this guide and these certifications:

Google recently launched its own certification program - Machine Learning Engineer which is similar to the above two and focused on distributed model training and scaling to production. That said, this is not a beginner program. You will need to have ML and architecture knowledge. They also recommend 3+ years of experience with Google Cloud products. 

Internship

Looking for a Data Analyst or Data Scientist internship or need help with your first project/portfolio?

Check this opportunity from Emergent Alliance, a non-profit community that exists to better inform the future economic decision making of corporations, small businesses, and nation-states. This is not a paid position, but you will get a chance to deliver your first project, try your skills “in a real-life” and get a better sense of what data analysts are doing. 

🍸 Drink and Mingle

Upcoming free events, meetups, talk, webinars

One important event for women I forgot to mention last week - the Women’s Hall Of Fame or Women’s Leadership Conference which kicked off this Monday. I know it’s already Wednesday, but you still can register and join the workshops and sessions. It’s a free and week-long conference with many interesting working sessions and key-note speakers (my amazing ex-colleague Anisha Weber is a speaker!), interactive workshops, and panel discussions. Join the group of women leaders to learn, get engaged, and inspired!

Stay safe everyone ❤️Until next Wednesday!