When To Use Client-Side Or Server-Side Events For Analytics - Issue 51
How to figure out which events to use for reporting and analysis, and what the difference is between client-side and server-side events
When you develop analytics and reporting at your company, the first stage in your process is analyzing and auditing the current data sources to understand what data comes from where, how it is treated, and where it is stored. Knowing the difference between client-side and server-side events is crucial to appropriately use the available data. This article covers the fundamental differences between types of events and provides guidance on which data to use for your analysis and reporting metrics.
First thing first, you don’t always have the opportunity to pick which events you want to work with. Every company has a different level of data maturity. Some are more data-driven and appropriately handle data pipelines investing into backend processing and storage, and others might have limited capabilities of event capturing, or are fully reliant on vendors or 3rd party data. Regardless of the state of data maturity and the back-end complexity, you must have a thorough understanding of your data sources as an analyst.
The difference between client-side and server-side events
Simply put, the client-side is browser (or app, or device) data that captures all events through pixels or tags and passes them to the server.
Server-side receives these tags passed from a client (browser) and passes them to a data storage. In a way, this is a data processing that completes the mapping of the client events to user_id or event_id and is completed by backend services (for example, Kafka). This is an expensive process and can get quite complex. The server’s “job” is to clean the data from duplicates and noise, and apply all necessary constraints or filters. Then the data is sent to the cloud or vendors.
Important things to know
Both client-side and server-side event streams have their advantages and disadvantages.
If your company is using a web intelligence analytics tool, I guarantee that you are dealing with client-side events. This is the fastest/easiest way to obtain data and is mostly used by tools like Mixpanel, Amplitude, GA, Braze, Pendo, or anything that integrates to the frontend code. In this case, the tracking payload is generated on a client device and passed to a web tracking tool via an API or SDK. It’s a straightforward integration and doesn’t require any additional infrastructure support.
That being said, for an analyst, this should tell you the following:
The device id is a primary event definition that gets mapped to a user id, and it’s not 1-1 mapping. You theoretically might overreport users depending on how they are mapped to the device id. For example, one user with multiple devices will be recorded as 2 or 3 different users.
The data might be affected by adblockers. This becomes a significant tracking issue lately with advanced user privacy. I remember 5 years ago the volume of ad blockers was 7 times smaller than it is now.
To simplify, you are using server-side data when you query a database. Backend services after data processing send it to the storage of either structured or unstructured data (read here to remind about data structures and architecture). So if you query Redshift, Postgres, Snowflake, MySQL, most likely this is processed data.
Things to remember working with the server-side data:
Server-side events go through many checks and processing stages that make them the most reliable and accurate data to use.
While data accuracy is important, it might be due to the cost of the data speed refresh. Depending on the data pipelines, you might have 24 hours or more delay for the data refresh.
What data to use for reporting and analysis
For a business, both are needed for tracking different things.
Server-side events speak the most clearly about user/event state and actions, therefore you would leverage this data for revenue, activity, and other sensitive metrics reporting.
This all sounds simple and nice in the perfect world until your VP of Marketing requests revenue data for churned users who subscribed via THAT campaign. And your VP of Product wants a full picture of the user lifecycle to see how long it took for them to finish onboarding and get converted. And the DS team is on the lookout for any available browser data for paid users to develop personalization.
It’s all doable, but the biggest opportunity for a mistake that often leads to a data apocalypse comes when analysts begin merging client-side and server-side events into one queue.
As I said above, at some companies that invest in data processing and management, you can leverage server-side events for every metric and report on sophisticated funnels and complex conversions. This might not be possible at other places, and merging client-side with server-side events will introduce a flawed report and a wrong perspective.
Unfortunately, products like Segment make it incredibly easy to introduce an error. Their mission to “empower every team with good data” can easily lead to “set a team up for failure with the wrong data”. By offering easy integration of all data sources into one stream for analytics, can actually lead to inappropriate events merging that overreport or underreport users or actions.
Proper integration with Segment from an engineering side should include an event mapping between all the sources connected to an internal user or event id. From the analytical side, an understanding of data and appropriate usage is required learning: When to track on the client vs. server.
Thanks for reading, everyone. Until next Wednesday!