May Digest: Navigating Through Data and Analytics Snowstorm
Case studies, analyses, new tutorials, and everything important you may have missed last month.
Welcome back to another edition of Timo and Olga’s Monthly Digest, your source for the latest updates from the world of analytics and data science.
Today, we will share some case studies, recent analyses, and tutorials you may have missed, along with events and new interesting publications.
🔊 Advocating for analytics
Thank you to everyone who came to the events last week!
First, the Product Analytics Leader's Meetup was a blast.
It was such an honor to share the stage with
, whom I have admired for many years, adopting concepts from his work and re-reading many of his publications. He is also such a friendly and nice person (who may or may not like analysts, rumors say). My favorite part of the evening was Tom’s presentation on the correlation/causation problem - how to leverage causal inference models to learn user behavior. Thanks again to Loops for having me and organizing such a great event.Impact of Data Processing Tools on Analytics and Reporting - what a fun night!
With an amazing crowd of over 100 people, a panel of ETL-CEOs from Airbyte, Dagster Labs, Honeydew, Mozart Data, and data advocates such as
and , we discussed building and growing in the AI era, balancing must-do and please-do projects, the future of ETL, metrics decision trees, and career change with AI taking over.To my readers who came to meet me last week and introduced themselves - you are the best! Please email me, and as a thank you, I will set you up with a subscription. It feels amazing to know there is a community of analysts and data leaders reading my newsletter ❤️.
🔥 Recent highlights
Artificial snowstorm. Last week, the Snowflake Summit caused many clusters of data engineers and data founders to aggregate in many areas of downtown San Francisco. It was nice to meet analytics leaders, my ex-colleagues, old friends, favorite speakers, and finally - my wonderful guest writers - Peter Fishman, Ben Rogojan, and David Krakov. (A reminder: here is my list of data and analytics events).
Tooling to save your analytics. dbt is not falling behind. On May 15, dbt announced new updates: a low-code / no-code experience for authoring dbt models, a new copilot for dbt, dbt Cloud enhancements, unit testing, connectors, and more.
Become a Tableau Ambassador. Tableau has just opened nominations for their 2024 Ambassadors. If you are using Tableau (and happen to enjoy it), consider applying to become a Tableau Data Champion. You will work for free for Tableau and engage in community work for the chance to boost your resume and level up. The deadline is July 11, 2024.
Missed the annual Tableau 2024 conference? Here are the recordings and highlights.
RIP, marketing analytics. Google has once again delayed the deadline to deprecate third-party cookies, now starting in early 2025. Why this is very important: Most analytics today for user online behavior, content personalization, and targeted ads rely on cookies. But before you panic, note this applies only to Chrome and only to third-party cookies. Moreover, the end of third-party cookies doesn’t mean the end of tracking.
Yearning for Data Council 2024? Look no further. Christophe got your back. One of my favorite bloggers (and data engineers) developed an app that allows you to search for keywords in the Data Council video playlist and return curated highlights. Try it here - Qrators (works only on desktop for now).
KaggleX Fellowship for BIPOC (Black, Indigenous, People of Color). KaggleX has opened applications for its annual fellowship program for data scientists. The project for this year will focus on building a chatbot by fine-tuning a conversation-style dataset with Gemma. You will need to complete the skill assessment to participate. I highly recommend trying it out.
⚙️Know your craft
Data and product analytics:
Metrics: don't mix up the measurement of reality with the reality itself.
How to Calculate Net Dollar Retention Rate (the right way) and The Top HR Metrics to Track from
.
Data science and ML
Oda’s online experimentation journey: Lessons learned and best practices
Building DoorDash’s Product Knowledge Graph with Large Language Models
Understanding logistic regression using predicted probabilities
Career
🤓 Analysis and case studies
A Guide to Measuring Feature Contribution to KPIs - a framework and steps on how to identify opportunities to drive growth via conversion, retention, engagement, and feature funnel analysis written by Tom Laufer, CEO and Co-founder of Loops. Timo: This was my favorite post in the last 30d.
Your Favorite BI Tool - A roundup of popular BI dashboarding tools to help you choose the best reporting tool.
Do you model in dbt or BI? - a blog post from Omni on why you should model your KPIs and metrics in dbt.
CAC payback period deep dive from Paul Levchuk.
Proving Marketing’s Incrementality: The Tech Stack from Avinash Kaushik
🎓 Tutorials
How to work with subscription data:
Supplement with How To Set Up Subscription Analytics For Growth Reporting
🌟 Spotlight: Amplitude dropped a bomb
“It's not every day that I get to talk about something that I think will change the future of analytics. Today is one of those days.
Introducing: Snowflake Native Amplitude” - Spenser in his announcement last week.
Amplitude re-platformed on Snowflake. Now, you can run queries from Amplitude directly in Snowflake and retrieve Snowflake data for charts and dashboards in Amplitude “with zero ETL”.
“Traditionally, analyzing behavioral data and financial data has been a disjointed process. Enterprise data has resided in Snowflake. And behavioral data has resided in Amplitude. With Snowflake Native Amplitude, we are breaking down these silos.”
Olga: We could see this direction Amplitude started a few years ago, announcing more connectors and integrations with Snowflake every few months. As I said earlier, many times, over and over again, Amplitude has positioned itself as a leader in product analytics with a focus on deep dives and trusted insight reporting, but there is nothing trustworthy about event data. Product analysts must learn to work and make decisions based on very low-trust data. If you are using Amplitude, sooner or later, you will be tasked with merging it with transactions or user data stored in a data warehouse. You may spend many weeks parsing and loading Amplitude into your database, or worse, piping data into Amplitude, only to learn a substantial gap between all your sources.
I am looking forward to trying this native connection. I can see how this will save many analyst hours. They promise that you can seamlessly map financial and enterprise data stored in Snowflake with digital product behavioral data. Let’s see how it actually works.
Timo: I can say, at least for my own setups I already have been living in the product analytics future for 12 months, where most event data is sourced from backend systems, databases, webhooks, or other sources into the DWH and just very few coming from SDKs. Then, a proper event model is applied, and finally, having the possibility to use product analytics tools on top of it. And I don’t want to go back. The data warehouse first approach gives me more control over the data, better data quality, and plenty of enrichment possibilities. I had some demos for Amplitude’s native Snowflake integration early this year and it works really well. Mixpanel introduced their data warehouse sync product Mirror some months ago, and then there is Netspring who opened up the race for data warehouse first event analytics.
The next step for me is the convergence of event (product) analytics datasets and classic BI aggregated data to bring it all together in one place. There are early implementations that look promising. But still some way to go.
❤️ Favorite publications this month
Bookmarked to re-read favorite takes
- : Data about data from 1,000 conversations with data teams
- : How Pinterest Scaled to 11 Million Users With Only 6 Engineers
- : The rise of the analytics pretendgineer
- : The Danger Zone in Data Science
- : Metrics trees and other mental frameworks
🎧 Videos and Podcasts
🍸 Drink and Mingle
Upcoming free events, meetups, talks, and webinars.
June 11-14, online, Postgres: POSETTE: An Event for Postgres 2024
June 13, Dallas: Tech Ladies Dallas Meetup
June 13, London: A/B Talks: London with StatSig
June 14, Paris: MeasureCamp
June 18, NY, DSS: Applying AI & ML to finance & tech
June 20, online, Tableau: DataDev Day June 2024
July 5, London + virtual: Nudgestock 2024
Thanks for reading, everyone!
Olga + Timo
Thanks for the shout-out