Introduction To Event-Based Analytics - Issue 142
Or: how to set up your events-based tracking for product analytics
Welcome to the Data Analysis Journal, a weekly newsletter about data science and analytics.
Addressing event analytics is way overdue in my newsletter. It should be taught as the opening series of Olga’s Masterclass Data Analytics 101 if that will ever happen.
A warning, as below is an analyst’s pure frustration pouring out:
Many data professionals and influencers refer to bad data as “Garbage in, garbage out”. After which they ironically spend a lot of time drilling into “garbage out” - poring through every little aspect of downstream data management, depicting every possible tooling and process breakdown to improve data processing, storage, and visualization. Every day there is a new article or LinkedIn post on the importance of data cleaning, data governance, data contracts, data observation, etc, all while never taking the time to dive into the source of analytics.
I am surprised and confused about how such a foundational aspect of analytics is underrated in today’s modern data landscape and is not properly addressed by data evangelists and content creators out there. Why?
Well, I’m excited to assertively discuss this today.
In this publication you will learn:
How to manage and leverage event-based data for analytics.
The best practice (my best practice, because I couldn’t find anything helpful out there) of setting up events, properties, and attributes for user activity tracking.
How to successfully maintain event-based data governance and documentation.
A guide on how not to get lost in the event data noise.
The difference between session-based and event-based analytics
Most analytics used for reporting can be broken down into session-based and event-based. Many product and data leaders don’t differentiate between these two. But there is a big difference - they measure activity differently and require different data handling.
Session-based data fuels marketing analytics
Session-based (also known as page-view) data aims to report sessions on the website including page views, exits, bounce rates, session length, and everything that is used to measure traffic on your website. It captures how the user session started (the source a user came in from) and how the session ended (bounced, converted, etc), along with session length, number of pages per visit, and more. You use session-based analytics to report on traffic, sources, campaigns, conversions, referrals, keywords, etc.
A common downside of session-based analytics is that user funnels are recorded within one session or visit. They are not accurate in capturing the actual user flows or paths, especially when your funnels are complex and consist of many steps that can be completed during separate visits.
Session-based analytics is usually reported via Universal Analytics or Google Analytics. Last year I covered why Google changed UA to GA 4. In a nutshell, they made a transition from session-based to event-based data tracking, because the world has evolved, and everyone wants better user engagement measurement now.
Event-based data is the foundation of product analytics
If you want to go more granular into user activity for what type of actions users do, you need to implement and configure event-based analytics. Events are user interactions, e.g. “upsell_view”, “button_click”, and “payment_submitted”, including scrolls, hoovers, toggles, and more.
Event-based analytics includes user actions and attributes. It’s more precise in capturing user flows and serves as a foundation for user behavioral reports.
Event tracking is passed from the client in a specific structure, similar to:
Event group (e.g. revenue)
Event action (e.g. purchase_completed)
Event label (e.g. item_name)
Event property (e.g. promotional)
For event-based data, you can configure as many custom dimensions and values as your system allows. It also is used for real-time analytics reporting.

Event-based data is commonly accessed via product analytics tools that are designed for tracking events.
Working with event-based analytics is not easy
It all starts with the source - a client.
A client is a mobile app, a tablet, a desktop computer, or a browser that captures all events through pixels or tags and passes them to your analytics tool (e.g. Mixpanel, Amplitude, Heap, Braze, Google Analytics, Pendo, etc) or a server, or else via API or SDK. Your ability to report data will heavily depend on the nature of this integration. You might get to the point that for one client/platform, you will have a rich set of clean user activity events which is intuitive and precise, and for another client, you will deal with over 60% disproportionally missing events that won’t make any sense. Your analyst might also spend many months cleaning and putting puzzles together that don’t fit. Read more on this - Why Most Analytics Efforts Fail.
Why this is the case?
Disconnect between product, analytics, and development
Developers can design event streams and analytics in different ways. They can configure the system to capture every piece of data movement and send this tracking payload as a massive unstructured bulk (if this is the case, it doesn’t matter what analytics tool you use, as you won’t be able to make sense of your events anyway). Or they can set the logic for it and aggregate it in a particular way, making it easy for product digital analytics tools (and the backend) to ingest and read the data.
The nature of such design will mostly depend on 2 things:
Defined proper personas and user stories that product leaders provide to the development team - e.g. for a typical blogging platform, there is “a reader” who can read an article, like, share, and “a writer” who can publish, edit, and send a newsletter.
The audience and objective of events analytics setup - e.g. are these events needed for analysts to monitor activations, for marketing teams to watch registrations, for DevOps to monitor security, for finance to report on revenue, or for developers to catch bugs?
How to set the right foundation for event-based analytics
Data catalog - a unified source of truth for product analytics
If you are an analyst, you already know that the success of data reporting depends on data governance. You probably have heard of metadata catalogs and metrics vocabularies. In a similar way, event-based catalogs describe client events. For product analytics, this mission is owned and shared between product managers and application developers.
The data catalog might have different formatting and live in Wiki, Notion, spreadsheet, or repo. It may have different names - taxonomy, governance, event tracker, inventory, or dictionary, but it aims to serve one purpose - be a source of truth for event-based analytics. I can’t believe I am writing this, but it has to be owned and maintained by the analytics team.
The data catalog should be segmented by product dimensions: revenue, security, core user activity, 3rd party integrations, etc. It doesn’t have to be extensive or detailed, and most record basic information like event name, properties, location, and description, for example:
Maintaining the data catalog will help merge iOS, Android, and web analytics into one and stick to the same naming and structure of events and properties. It also helps address historical changes in events.
Successful catalogs should be iterative. Older analytics eventually will get retired and cleaned up, and new features will be added. Catalogs help to coordinate and record these movements. Its maintenance should be part of every new future planning and implementation and embedded into the development process.
Keep events tracking simple and balanced
How to find the balance between capturing all user activity in the app and not overwhelming analytics with thousands of noise events like toggles, hoovers, and every banner or card view?
Often product teams request collecting every movement happening in the app which leads to the “event pollution” that continues to significantly magnify as more features, tests, and versions are released. If events are not properly maintained, you might end up with 10 different events for one screen view that overlap in one period of time and gap a few weeks later.
My recommendations:
Start with the high-level essential core business functionality which should be depicted via user stories and personas (developer vs operator or reader vs writer). Once that tracking is set, move to the lower abstraction levels to bring in more details and nuance.
Focus on capturing success events (signup_completed, payment_submitted, trial_activated, activity_logged, etc) and user intent events (onboarding_started, upsell_view, recording_start, etc). Having user intent and success events will help you report funnels and conversions.
Ignore granular user activity like hovers, scrolls, toggles, banners or card views. They create a lot of event noise and aren’t helpful in understanding user engagement.
Keep a small number of failure events. I usually don’t request any failure events at all, because they are rarely helpful unless there is a bug (which you will be able to notice in your funnel charts anyway).
Set initial activity events. Many teams ignore this, but having different events for the initial user activity and consecutive activity will save you a lot of time in measuring and reporting engagement and retention. For example, first_app_open vs app_open, first_payment_success vs payment_success, first_run_recorded vs run_recorded. By differentiating these, you can report on new users, returned users, winbacks, and more.
Maintaining event analytics should be embedded into the development process
Every launch of a new feature or experience requires creating new events to support analytics.
Analysts should monitor the creation of new events and properties, making sure they are in line with expected data volume and drive data governance. This responsibility will be different at various companies, as this process is often divided between multiple teams - engineering, product, and analytics. The most common scenario:
PMs create a list of events they wish to have in order to monitor CTAs, CVRs, and user behavior. They pass it on to analytics for review and approval.
Analysts are responsible for making sure new events are necessary, sufficient for analytics, and follow the set format and practice. Once new events are approved, they’re passed to the development team.
The development/engineering team creates new events and new properties, and the Q&A team works on validating them.
Many product and analytics leaders underestimate the importance of proper personas and user stories provided for client application development. But this input data set the nature of client development that will define your analytics. If your event tracking is not trusted, most of your analytics efforts will fail. It takes cross-functional effort to maintain governance, but if done well, it will make your analytics powerful.
Thanks for reading, everyone. Until next Wednesday!
In my experience - event tracking that is defined closely with development has a higher quality and usually stays like this over a long time. Concepts throwing over the fence usually don’t work.
I totally agree with your priority order. I made the same experience.