Hello dear readers, and please join me in celebrating two months since I started my newsletter and launched the Data Analysis Journal!
This month I covered the most common A/B testing questions, user segmentation techniques, and shared helpful resources, events, and materials on data analysis, data science, and data engineering. This issue is a recap of my past month newsletters.
📚 Weekend Longreads From Data Analysis Journal:
Data Science. Read about the Decision Tree Classifier - a popular supervised machine learning algorithm which can be used for predicting categorical and continuous variables, and learn how to apply it for a binary classification problem.
Product testing. There are so many good materials already written on running A/B tests, and yet it can be so difficult to search through it to find the right explanations for basic questions. Check my One-Pager on how to conduct A/B tests, statistics to know, and factors to look for.
Data analysis. User profiling is a very common type of data analysis, which helps you segment a user base into fractions to learn and understand customer type and behavior. Read my most recent write up on how to use SQL to get user actions across the webpage and bucket them into categories, which later may be used for additional deep-dive analysis, A/B testing, or user research.
✨ 10 Quick Takes From Last Month Newsletters:
Data Engineering. As data and infrastructure applications mature, more companies are doing data migrations now. Data migration is the process of preparing, extracting, transforming, and moving data from one storage to another. It’s a close collaboration between data engineers and data analysts. Read this article to learn what it takes to migrate 40TB data from one infrastructure to another
Data Science. Check 365datascience, one of my favorite Data Science learning platforms (with some free courses open now). I like how it structures the content and focuses on the terminology and concepts you will be using at your work.
Its co-founder, Iliya Valchanov, shared an interesting in-depth study of job offers in the data science field. He describes that Python is the most required skill followed by R and SQL. And, Spark, AWS, and Hadoop are the leaders for databases/cloud products. Read the study to learn more interesting insights.
Product Analysis. Get familiar with the concept of Adjacent Users developed by Bangaly Kaba (Former Head of Growth at Instagram and Instacart). This is a group of users who know about your product but haven’t become engaged users yet. To convert the Adjacent Users into your Power Users, you have to know who is successful today and why. This will help you to differentiate the “powerful” category from the adjacent. There are many personas for your product/service, but targeting the right user segment is very important. When it’s done correctly, you will see the improvement in your user retention and engagement.
Product Analysis. Another interesting metric to watch for - the Word of Mouth Coefficient, developed by former VP Growth @ Eaze, ex Zynga, ex Disney Yousuf Bhaijee. Word of Mouth (WOM) is an important metric to drive user growth. It is the number of new organic users divided by active users (returning users + non-organic new users) because the active users are more likely to talk about your product than those who have never used it or have stopped using it. Yusuf makes WOM more measurable, scalable, and stable by tying it to active and returning users.
Data Science. Check out aito.ai - the first predictive database and “the next generation” of ML. Its predictive queries replace ML. Imagine running SQL queries that would predict data instead of extracting it. This is a huge step forward in ML democratization, making it accessible for everyone. It also introduces a radically new workflow and architecture of predictions.
Data Engineering. MongoDB co-founder and former CTO Eliot Horowitz shared a piece on the creation of MongoDB and why it is “fundamentally better” for developers. They described why their database “pushed the industry” and how exactly MongoDB makes developers’ life easier.
Data Analysis. If you are dealing with missing values in your dataset, you can either ignore them or do data imputation (replace them). Which one to choose? The answer depends on (1) what type of analysis you are working on, (2) how much data is missing, and (3) if values are missing at random (randomly distributed).
Data Science. If you are looking to get an ML certificate but not sure which platform to pick AWS or Azure, check this guide to make a decision between these AWS Machine Learning Engineer or Azure Data Scientist certifications. A new certification program from Google, Machine Learning Engineer, is similar to the above two and focused on distributed model training and scaling to production.
Cheer yourself up by checking out this webcam to see gorgeous bison live in the paddock in Golden Gate Park. No, it’s not related to big data. But now you know there are bison in Golden Gate Park in San Francisco!
🍸 Drink and Mingle
Upcoming free events, meetups, talk, webinars
Sep 2, Tamr: Data Mastering at Scale
Sep 3, DSS: Predictive Analytics In Finance
Sep 3, GDG Global: Job&Salary Negotiation Skills During Covid19 and Beyond
Sep 7, Hack Reactor: Introduction to Computer Science
Sep 10, Darabricks: Data pipelines and ML with Spark
Sep 15, Looker: Light Up Your Data Direction
Sep 16, Affirm: ML Talks
Sep 30, Anaconda: Performance Tips For Pandas
If you missed my August newsletters, here are the links:
Thank you all again for your support and for sharing this ride with me ⭐.
Until next Wednesday!