November Recap: Before You Trust That Lift
Your monthly recap of the latest trends, market shifts, and updates in data science and analytics
Welcome to my Data Analytics Journal, where I write about data science and analytics.
This month, paid subscribers learned about:
10 Must-Know Concepts Every Analyst Should Know - 10 foundational principles that shape how data scientists reason about and communicate data. Most relate to critical thinking and statistics, the concepts we use when analyzing A/B tests or running analyses.
Structuring Data & Analytics Teams for Real Impact - A breakdown of common data team structures, their trade-offs, and how to maximize your impact regardless of where the analytics function sits.
Churn: Why Most Teams Get It Wrong - How to approach churn reporting, how to work with churn vendor data, and how to know whether your churn data is accurate.
November moved fast. Everyone is racing into year-end. Here are the key updates across data science and analytics: new case studies, analyses, and developments shaping where our field is headed.
But first, a quick announcement:
✨ Hex Magic Shows
Hex is hosting the first-ever Magic Show events in NYC and SF, featuring speakers from Ramp, OpenAI, Fivetran, Astronomer - and yours truly. We’ll cover how teams are using AI in analytics, how to build AI agents, and the future of data. Free, casual, and fun. Signup here:
Hope to see you there!
🔊 Advocating for analytics: Own your BI
From Churn: Why Most Teams Get It Wrong:
“Most subscription platforms misreport churn. Every tool provides a generic churn metric meant to work for SaaS, enterprises, big and small B2B, products with subscriptions, ads, and one-off transactions. And that’s exactly why it breaks. It’s not that these tools misread your data or can’t segment it correctly. They’re built to deliver a one-size-fits-all churn number (unless you have an exclusive plan with them, allowing customizations).
You have to make a call whether to trust that plug-and-play metric and bet your growth and budget on it, or build something accurate in-house - something designed for your product, your subscription plans, your grace periods, and your trials.
You should own the metric definitions inside your database. It lets you narrow TTP (Trial-to-Paid) to your actual trial length, segment subscriptions across all 20 plans you offer, and exclude lifetime plans or free offers from ending contracts, so your retention is accurate.”
No vendor knows your product better than you do. Own BI reporting, tailor it to your context, and connect it across your systems.
🔥 November highlights
Deepnote goes open source!
After 7 years of building a data notebook that lets you work with SQL+Python+Build Viz+Do tables, Deepnote is now open source. This gives teams more control, better integration with existing workflows, and an open standard beyond the limitations of .ipynb. You can install Deepnote here and run notebooks locally in VS Code, JupyterLab, or anywhere else with the open-source Deepnote Toolkit. You can also organize multiple notebooks, integrations, and settings into a single .deepnote project for better structure and collaboration.
Apps inside apps: Apple quietly launches the Mini Apps Program
We don’t know yet what it really means or how it works, but it appears that the Apple store will give platform-style apps a 15% fee on purchases coming from “mini apps”. The prerequisite is using the new APIs. This is meant for platform-style apps (whatever that means), not single-publisher products, and it formalizes how “apps inside apps” can operate on iOS. Read more here - Apple’s App Store Mini Apps Partner Program. Good luck doing analytics for that.
What really happened in the Mixpanel data breach
If you use Mixpanel, Thanksgiving probably wasn’t peaceful.
On November 27, Mixpanel’s CEO disclosed that customer data (including user names, email addresses, and device information) had been leaked. The breach appears to have originated from an SMS-phishing attack and impacted at least two companies we know about: OpenAI and CoinTracker. Likely more. Two days ago, we learned that both OpenAI and Mixpanel now face a class action lawsuit over the incident.
The breach itself isn’t the shocking part. Data breaches are common, far more common than companies admit. Just days earlier, Google reported that hackers stole data from 200 companies following Gainsight breach.
What is concerning is the communication.
We learned about the breach first from Mixpanel customers. First, CoinTracker notified its users on November 26. Three hours after that, OpenAI published a detailed incident report explaining the impact and announcing that they were terminating Mixpanel usage. Only after both disclosures went public did Mixpanel post a vague, high-level blog summary the following day. If OpenAI and CoinTracker hadn’t gone public, it seems likely we would never have known. Mixpanel also ignored press inquiries, making the situation look even worse.
What’s even more upsetting to me is how quickly competitors jumped in to refresh their SEO and capitalize on the story:
If I were a Mixpanel customer, this wouldn’t make me jump to another provider with exactly the same vulnerabilities and integration risks. I’ve lived through enough breaches to understand the pattern: once a company has been breached, its systems become safer, not less. You’re forced to implement better controls - more audits, new protocols, stricter data handling, identity obfuscation, hashing, IAM tightening, training, and more. I’m confident Mixpanel today is almost certainly more secure than it was a month ago.
📈 New industry reports and benchmarks
2025 Digital Game Advertising Report - SensorTower.
State of AI: December 2025 - Air Street Capital.
The State of Data Jobs in 2025 - beautiful data portfolio Swagata Ashwani.
⚙️Know your craft
How a New Grad Broke into a Data Science Role from InterviewQuery.
Introducing the Messy Middle manifesto - Dan Schmidt / Mixpanel.
How we reduced Snowflake costs by 70%: a practical optimization guide - very good breakdown from ChartMogul.
Translating Data Buzzwords Into Real Requirements from SeattleDataGuy.
3 Lessons from implementing Controlled-Experiment Using Pre-Experiment Data (CUPED) at Nubank - NuBank Data Science team.
Understanding feature flags: The foundation of reliable A/B tests from the Photoroom Product Analytics team.
Data Quality at Scale: Why Your Pipeline Needs More Than Green Checkmarks from Pradeep Kalluri.
The Data Layers Powering Modern GTM from Ananth Packkildurai.
Why You Are (Probably) Measuring Time Wrong from Counting Stuff.
Experiments or causal inference? - from Meta analytics team.
🤓 Analysis and case studies
A few takeaways from Ron Kohavi’s recent publication on A/B Testing: The Science of Not Fooling Yourself.
If you want to prove the causality of particular events and effects, the most effective way is A/B testing. ML modeling can get you halfway, but hypothesis testing remains the most trusted method across many companies. If done right.
Case studies and historical data can be misleading. They often exaggerate results or even get the direction wrong. The most trustworthy evidence comes from randomized experiments (aka A/B tests), especially when they’re repeated and show the same outcome. Do you repeat the same A/B test twice?
A “statistically significant” result doesn’t mean what most people think. Seeing p < 0.05 does not mean there’s a 95% chance your Variant is better. Because only about 10% of experiments actually succeed, a “significant” result is only about 78% likely to be real. And if you test lots of ideas, some will look positive purely by chance.
You need a lot of users to detect small improvements. Detecting a 5% improvement on a 5% conversion rate needs more than 240,000 users. Detecting a 1% improvement needs millions. But remember - small experiments often produce misleading “big wins” because they’re underpowered.
Huge lifts from small studies are almost always wrong. When something shows over +50% lift from a tiny sample, assume it’s a false alarm until it’s replicated with a larger traffic. Many famous claims (like power poses or “rounded buttons increase clicks”) fell apart when tested properly.
A/B tests aren’t perfect, but they’re the most reliable way we have to measure truth. To trust your results, you need enough users, enough skepticism, and ideally more than one experiment showing the same outcome.
❤️ Favorite publications this month
Bookmarked to re-read favorite takes
The Most Powerful, Timeless Skill to Learn as a Data Professional from Ergest Xheblati
The Success Trap from Mike Fisher.
➡️ Event recaps
App Growth Annual 2025
For analysts and growth leaders working in mobile apps
11 Lessons you’ll want to remember from App Growth Annual 2025
💎 Growth Gems #135 - Exclusive Insights from RAGA ‘25 (Part 2)
Open Source Data Summit (OSDS) 2025
For data scientists and ML engineers
Watch on demand how PayPal, Onehouse, Ampere, Uber, Walmart, ADP, and others shared their strategies on how to utilize open formats, multi-catalog, and cost SLOs successfully.
Data Science Salon (DSS) in SF 2025
For enterprise data, data quality, and data engineering
If you missed the event 2 weeks ago, you can watch all the talks here - GENAI AND INTELLIGENT AGENTS IN THE ENTERPRISE. Use the DSSSF20252 code to access.
🙃 That’s a Wrap

✈️ Upcoming events in December
Dec 1-5, Las Vegas: AWS re:Invent
Dec 9, New York: Hex: AI, Data, and even a little bit of magic
Dec 9, London: Experimentation Elite
Dec 9-10, Stanford: AI + Health
Dec 11, New York: DSS: THE FUTURE OF APPLIED AI IN Finance and Banking
Dec 11, Berlin: Berlin Experimentation Meetup #11
Dec 16, San Francisco: Hex: AI, Data, and even a little bit of magic
Dec 15-18, Seville: International Conference on Statistics & Data Science 2025
Dec 27-29, Dhaka, Bangladesh: International Conference on Applied Statistics and Data Science 2025
Thanks for reading, everyone!




