From Event Collection To Behavioral Data Modeling
October Digest: Case studies, reports, analysis, and tutorials you may have missed
Once per month, I invite timo dechau 🕹🛠to join me in a whirlwind discussion of all things analytics across product, data science, business, and academia.
We put together recent case studies, reports, analyses, and tutorials you may have missed, along with upcoming events. Our goal is to advocate for old-fashioned analytics, promote new high-quality content, and help you become a better analyst.
🔊 Advocating for analytics
The Ancient Art of Data Management
Inaugural lecture of DuckDB's co-founder, Professor Hannes Mühleisen, given at Radboud University in Nijmegen (Netherlands) on 28 September 2023.
“I perceive a lack of critical thinking in Computing sciences and maybe all of the engineering disciplines, especially when taught at the University. We spend a lot of time saying what can be done. We don't spend a lot of time thinking about whether this is a good idea.”
in Why Today Is The Perfect Time to Learn Data
“There seems to be a constant desire to rush through the learning process. There are plenty of 12-week bootcamps and accelerator courses to learn new technology. But our careers are often pretty long, and I don’t think there should be a constant rush to speed through learning. If you can, take your time and enjoy learning. ”
When Things Go South Or Why Self-Service Is A Myth:
“Self-service can introduce discrepancies, inconsistencies in reporting, and security risks. Most data requests involve analysts reframing the question, often leading to “We can’t answer it with the data we have” or “Instead of A, we can provide you B”. Now imagine that the let-me-help-you-to-ask-the-right-question layer is gone. What happens? “
🍁 October highlights
Did you miss DataCon LA, the largest conference on analytics, BI, and data science? Here are the recordings.
Amplitude released the new self-serve plan - Unlock the Power of Amplitude for Less.
Timo: The big thing in this for me is that Amplitude is making feature analytics & testing available for many smaller companies. There are still some things I hope they make available as well (like group analytics). But it is a good step.
“For years Reforge courses have unlocked expertise from top tier builders. Today we are expanding on that by launching the Reforge artifacts product free to everyone.”
📈 Industry reports and new benchmarks
State of Open Source AI Book - 2023 Edition from everyone.
🤓 Analysis and case studies
Experimental Design in Two-Sided Platforms: An Analysis of Bias - Stanford and Airbnb collaborated on an academic research project to solve bias in A/B testing. Marketplaces (Airbnb, Amazon, eBay, Uber, etc.) are at the mercy of interference (where an “intervention” applied to one market influences the behavior of another).
SQL For Weekday Product Usage Analysis - How to get the frequency of user engagement using SQL for day-of-the-week analysis to understand cyclicality and usage patterns.
Data pipeline orchestrators - the emerging force in the MDS? - A deep dive into data pipeline orchestrators - what orchestrators do, the different types of orchestrators, how to choose the right type, and more.
Unlocking Machine Learning: Propensity Models - P1 from one and only Avinash Kaushik about bringing AI into marketing to understand how to acquire a high-value customer (or prevent churn).
⚙️Know your craft
A Beginner's Guide to Sequence Analytics in SQL
Timo: Motif is the hot item in my “play with me” backlog - sitting there and wanting my attention for quite some time. But I really want to have time for it. Until this comes, this post is a great introduction to working with (and prepping) event data with SQL.
Applying Statistics In Product Analytics - A deep dive into distributions, their types, use cases, and examples to help you decide which distribution to apply for your analysis or forecast.
Kill long-running queries in PostgreSQL - There are two ways to terminate a slow query: (1) terminate the query but keep the database connection alive, or (2) kill the entire database connection. Read to learn when to do what.
Working with Money in Postgres - Postgres has a money data type. Yep, there is a data type called “money” (currency). A helpful guide with SQL and quick solutions on how to work with currency data in Postgres.
Financial-machine-learning - A curated list of practical financial ML (FinML) tools and applications in Python.
Understanding Deep Learning - A new academic paper was revealed before its publication in December.
Boring is Back - The Longer Rant
Timo: I loved the initial boring stack post, which was mostly about software dev (and frontend frameworks). Boring stacks are also great for data. Yeah, on a Friday block some time for the new fancy shiny stuff these relentless data influencers are posting all the time. But these are no candidates for your production environment.
📚 Weekend Longread
The Snowplow Manifesto around behavioral data - how to generate it, govern it, and leverage it:
Your primary tag needs to be Switzerland
: I read this publication in a single breath. Even though I am (a) convinced CDPs are redundant and (b) believe that event data is a very special unicorn that may or may not need a seat at the BI table, I am still fascinated by this publication. From the Snowplow history to tag management madness, could put into words so nicely the challenge of product analytics tooling and their CDP fate. That’s how you sell your product. : Naturally close to my heart. I strongly believe in an agnostic event collection layer. The big question is, where will it be? Will it be a “strongly-typed” event pipeline like Snowplow or a data model layer? We don’t know this yet, but the concept is important for the future.⛔ Hot Seat
Recent publications that made us raise an eyebrow.
Mastering Customer Segmentation with LLM
Olga: strong analysis and great walkthrough, but you do not need LLM for user segmentation. Simpler and cheaper ML models would work as well. But why not kill an ant with a sledgehammer? And you won’t get published in the TDS without trending keywords anyway.
Is cohort analysis useful?
Everyone: Yes.
Customer Journey Touchpoint Sequence Analysis & Optimization
Timo: This was a tricky one for me - there are many good points, but I struggle with the structure. Too much scratching the surface. I hope Sven will take it and turn it into a 7 part content series - then I will feature each one in one of our categories further up.
❤️ Favorite publications this month
Bookmarked to re-read favorite takes
- : Organizing Data is Picking What you Care About - and don’t miss his talk today at NY PyData!
- : How top tech companies measure Net Dollar Retention Rate
- : Growing Revenue at a Pumpkin Patch. Obviously, Adam never sleeps.
I Accidentally Saved Half A Million Dollars
Olga: Thanks to the author for sharing and pouring their heart out. “Advanced Analytics Platform”, “Customer Data Platform”, or “Business Data Platform” indeed sounds like a form of analytics scam.
📊 Monthly Chart Drop
🍸 Drink and Mingle
Upcoming free events, meetups, talks, and webinars.
Sep - Nov, online: ML⇄DB Seminar Series
Nov 1-2, virtual: Snowday: The View Ahead
Nov 1-3, NY: PyData NY 2023
Nov 7-8, virtual: Causal Data Science Meeting 2023
Nov 8, virtual: Career Roadmapping with Wendy Saccuzzo
Nov 8, virtual: Impact: The Data Observability Summit
Nov 14-17, Seattle: PASS Data Community Summit
Nov 16, virtual: PostGIS Day 2023
Nov 29, San Francisco: DSS Using Gen AI & ML in the enterprise. I have a few tickets to give away. Reach out if you want to attend! ⭐
Dec 1-3, virtual: PyLadiesCon 2023
Thanks for reading, everyone!
Olga + Timo