Discover more from Data Analysis Journal

Where product, data science, and analytics intersect. Trusted by tens of thousands of data scientists around the world

When To Use Client-Side Or Server-Side Events For Analytics - Issue 51

How to figure out which events to use for reporting and analysis, and what the difference is between client-side and server-side events

Jul 07, 2021

When you develop analytics and reporting at your company, the first stage in your process is analyzing and auditing the current data sources to understand what data comes from where, how it is treated, and where it is stored. Knowing the difference between client-side and server-side events is crucial to appropriately use the available data. This article covers the fundamental differences between types of events and provides guidance on which data to use for your analysis and reporting metrics.

First thing first, you don’t always have the opportunity to pick which events you want to work with. Every company has a different level of data maturity. Some are more data-driven and appropriately handle data pipelines by investing in backend processing and storage, and others might have limited capabilities of event capturing, or are fully reliant on vendors or 3rd party data. Regardless of the state of data maturity and the back-end complexity, you must have a thorough understanding of your data sources.

The difference between client-side and server-side events

Simply put, the client-side is browser (or app, or device) data that captures all events through pixels or tags and passes them to the server.

The server-side receives these tags passed from a client (browser) and passes them to a data storage. In a way, this is a data processing that completes the mapping of the client events to user_id or event_id and is completed by backend services (for example, Kafka). This is an expensive process and can get quite complex. The server’s “job” is to clean the data from duplicates and noise, and apply all necessary constraints or filters. Then the data is sent to the cloud or vendors.

From Segment

Important things to know

Both client-side and server-side event streams have their advantages and disadvantages.

Client-side reporting

If your company is using a web intelligence analytics tool, I guarantee that you are dealing with client-side events. This is the fastest/easiest way to obtain data and is mostly used by tools like Mixpanel, Amplitude, GA, Braze, Pendo, or anything that integrates with the frontend code. In this case, the tracking payload is generated on a client device and passed to a web tracking tool via an API or SDK. It’s a straightforward integration and doesn’t require any additional infrastructure support.

That being said, for an analyst, this should tell you the following:

The device id is a primary event definition that gets mapped to a user id, and it’s not 1-1 mapping. You theoretically might overreport users depending on how they are mapped to the device id. For example, one user with multiple devices will be recorded as 2 or 3 different users.
The data might be affected by adblockers. This becomes a significant tracking issue lately with advanced user privacy. I remember 5 years ago the volume of ad blockers was 7 times smaller than it is now.

Server-side reporting

To simplify, you are using server-side data when you query a database. Backend services after data processing send it to the storage of either structured or unstructured data (read here to remind about data structures and architecture). So if you query Redshift, Postgres, Snowflake, or MySQL, most likely this is “processed” data.

Things to remember working with the server-side data:

Server-side events go through many checks and processing stages that make them the most reliable and accurate data to use.
While data accuracy is important, it might be due to the cost of the data speed refresh. Depending on the data pipelines, you might have 24 hours or more delay for the data refresh.

What data to use for reporting and analysis

For a business, both are needed for tracking different things.

Client-side events are the source for cookies, URL parameters, user agents, referrers, and IP addresses. Therefore, you would use this data for reporting SEO, traffic, views, credentials, and demographics. If you are a marketing analyst, this is where most of your work will be done. You will use cookies data for ad targeting, UTM parameters to analyze marketing campaigns and promotions, and use tags for the user journey.

Server-side events speak the most clearly about user/event state and actions, therefore you would leverage this data for revenue, activity, and other sensitive metrics reporting.

This all sounds simple and nice in the perfect world until your VP of Marketing requests revenue data for churned users who subscribed via THAT campaign. And your VP of Product wants a full picture of the user lifecycle to see how long it took for them to finish onboarding and get converted. And the Data Science team is on the lookout for any available browser data for paid users to develop personalization.

It’s all doable, but the biggest opportunity for a mistake that often leads to a data apocalypse comes when analysts begin merging client-side and server-side events into one queue.

As I said above, at some companies that invest in data processing and management, you can leverage server-side events for every metric and report on sophisticated funnels and complex conversions. This might not be possible in other places, and merging client-side with server-side events will introduce a flawed report and a wrong perspective.

Products like Segment is a common solution to merge server-side data with client-side. And, they make it incredibly easy to introduce an error. Their mission to “empower every team with good data” can easily lead to “set a team up for failure with the wrong data”. Offering easy integration of all data sources into one stream for analytics can lead to inappropriate events merging that overreport or underreport users or actions.

Proper integration with Segment should include an event mapping between all the sources connected to an internal user or event id. Read also - When to track on the client vs. server. I will do a deep dive into Segment implementation and use cases in one of my next newsletters.

Thanks for reading, everyone. Until next Wednesday!

Share Data Analysis Journal