Discover more from Data Analysis Journal
I can’t believe another exciting week of masks and zoom meetings passed by so fast! Welcome to the Data Analytics Journal Issue 6, and hello to the 150 new subscribers who have joined since last week!
✨ This week highlights:
A/B test one-pager guide for product analysts - keep it close and reread as needed.
Millions for AI - an interesting rise of AI startups and acquisitions to watch for.
Leading and managing data analytics teams in the pandemic age, and how to encourage data sensitivity precautions at your company.
Expert spotlight on growth and strategy - the Word of Mouth Coefficient break down and how to use it to scale user growth.
Let’s get started!
📚 Weekend Longread
Product testing. There are so many good materials already written on running A/B tests, and yet it can be so difficult to search through it to find the right explanations for basic questions. Therefore, I’ve decided to launch a new section in my journal called One-Pager. This is a short brief guide covering only the essential information you need - steps, concepts, and must-know terminology.
Check my first One-Pager on how to conduct A/B tests, statistics to know, and factors to look for.
🔥 What’s new this week
R or Python? R and Python!
In one of my previous newsletters, I spoke about how R made a comeback this summer. Here is an interesting insight from the developers of RStudio on how they expanded the R library functionality to support Python, including Jupiter, Streamlit, and other Python products. Frankly, I don’t see the value of using Jupiter from RStudio, given that R markdowns are similar to Jupiter notebooks, but this is great news for analysts who enjoy using both programming languages for data analysis!
Data engineering gets even more efficient with:
processing time-series data (when each data point is inserted as a new value, instead of overwriting the earlier value). Built on Postgres, TimescaleDB is the only open-source time-series database that supports full SQL. Even running on a single machine, TimescaleDB is very fast and scalable which beats PostgresSQL performance. Timescale Cloud is now available in AWS, Azure, and Google Cloud Platform.
saving on the data storage in AWS. Check this helpful guide on how to archive AWS data to reduce storage costs and how to use S3 effectively.
AI today and tomorrow
Imagine running SQL queries that would predict data instead of extracting it. That would be a much faster and simpler way to do ML. It already exists!! Check out aito.ai - a predictive database and the next generation of ML. And yes, its predictive queries already replace ML. This is a huge step forward in ML democratization, making it accessible for everyone. It also introduces a radically new workflow and architecture of predictions.
New fundings for ML/AI startups
Revenue Intelligence AIstartup Gongraises another $200M in its $2.2B valuation. What it does - it aggregates sales data from CRM databases and using AI analyzes the data patterns providing analytics to its customers.
Drug Discovery AI company Atomwiseraises $123M in funding. It uses AI and ML to analyze the creation of new drug molecule data and then uses AtomNet (a neural network) to analyze millions of compounds to optimize the time and cost in drug discovery.
Qumulo, a Seattle-based data intelligence company, raised a $125 million Series E. The company provides solutions to help customers store and manage large amounts of data.
While we are on this topic, did you know that Apple acquired the most AI startups over the last decade? This was closely followed by Google and Microsoft. Interesting to see how AI gets such a big focus from tech leaders across different industries.
🏆 Nailed It
Leading and thriving in the age of Covid-19
In Q2 2020, Corinium, the market intelligence and advising network, conducted a global survey of 104 Chief Data and Analytics Officers across different enterprises to learn how C-level executives are developing AI and using it to boost their products and services during the pandemic. Some interesting insights:
57% of participants mentioned the increased demand for AI products as a result of COVID-19.
67% don’t monitor or tune their models to ensure their continued accuracy
93% say “ethical considerations must be dealt with to drive AI adoption within their organizations”
Learn more and request the full study results here.
COVID-19 forces health officials to analyze and use thousands of individual records to trace, test, and treat patients. This data presents a high risk of a privacy breach and/or data leakage (you remember the Meow cyber attack, right?). Analysts have to learn and adopt new techniques to protect sensitive data while still delivering high-value analysis. Read this whitepaper which offers some recommendations on the best privacy-enhancing technologies while working with sensitive user health data.
💡 Expert Spotlight on Growth and Strategy
Yousuf Bhaijee (Former VP Growth @ Eaze, ex Zynga, ex Disney) shared his view on defining a metric to measure the Word of Mouth Coefficient.
Here is a quick summary of his theory and some thoughts:
Word of Mouth (WOM) is an important metric to drive user growth. It becomes critical now because most media communication channels have become “more crowded and less cost-effective”.
The biggest challenge with WOM is that it’s difficult to measure and thus, difficult to influence. Usually, it’s a mix of NPS+surveys+user interviews
The WOM coefficient is the number of new organic users divided by active users (returning users + non-organic new users). The active users metric is defining for WOM because they are more likely to talk about your product than those who have never used it or have stopped using it.
Yousuf demonstrates the data comparison for this metric across gaming, EdTech, and a music mobile app and proves in all three cases that the defined metric is stable and a key in driving user growth.
WOM has, of course, been around for ages. It is a marketing term and has been used extensively in advertising and marketing campaigns to engage audiences. What I like about Yousuf’s take, and what makes it unique, is that he brings this metric into the product, making it more measurable, scalable, and stable by tying it to active and returning users. Based on active users, WOM works as a growth funnel or better - loop,(which I’ll focus on in one of my next issues).
🍸 Drink and Mingle
Upcoming free events, meetups, talk, webinars
Aug 19, John Snow Labs: Analysing Healthcare Data
Aug 20, Data.World: Accelerate Data Initiatives and Culture
Aug 21, Girls in Tech SF: Hacking for humanity 2020
Aug 26, PyLadies: Interactive Data Visualisation with Bokeh
Aug 27, R-Ladies: Tangible Steps Towards Algorithmic Accountability
Aug 27, Appen: How to identify the right initial use case for AI
Sept 2, WiMLDS: Ethnic and Race in Tech
Sept 26, OpenMined: Data Privacy Conference 2020
Sept 30, Ray Summit: Scalable ML, scalable Python, for everyone
🎓 Try It Out
A quick Python exercise. Because it’s fun.
We have two monkeys, a and b, and the parameters a_smile and b_smile indicate if each is smiling. We are in trouble if they are both smiling or if neither of them is smiling.
Return True if we are in trouble.
Try it yourself and check your solution here.
🔮 A few more things...
Is there a topic you think I should cover or a data analysis problem to discuss? I’d love to hear from you! You can simply respond to this email, or drop it in the comments.
Know someone who’d enjoy this newsletter?
Was this newsletter forwarded to you?
Thanks for reading everyone ❤️ Until next Wednesday!