The Battle You’ll Always Lose: Documentation - Issue 234
How to create and maintain proper documentation in analytics.
Welcome to my Data Analytics Journal, where I write about data science and analytics.
Analytics today is more fragmented than ever. Each domain - data quality, monitoring, automation, visualization, or modeling - has its own specialized tools. Yet, ironically, 2 of the most fundamental domains still don’t have a proper solution:
Version control
Documentation
I saw Hashboard’s solution to version control during a recent demo. I liked it very much. I truly think their approach via git is one of the best solutions for version control in BI. But let’s face it - if your team can adopt Git, the problem is largely solved with or without Hashboard. I haven’t met such a team yet, though.
Deepnote also offers an approach to version control. As DataGrip. As Metabase. As Tableau and Looker. As dbt, and most other data tools. And that’s a problem - these solutions are siloed, working only within the tool’s ecosystem. However, analysts rely on a mosaic of tools to get our work done.
Documentation is an even bigger challenge. I haven’t met a tool or an application in analytics yet that handles documentation well. So today, I want to share my guide on how to create, adopt, and maintain proper documentation in analytics. I’ll also walk you through how I structure old analyses, guides, and roadmap items. It’s in no way perfect, but it served me well and is somewhat sufficient to keep the ship afloat and maintain somewhat of an order.
It’s wild out there
At every company I was part of, in order to make it work, analytic teams ended up using multiple applications for managing documentation that, to some extent, overlapped, was insufficient, or required duplicating the same work twice or more.
For example:
We relied on Mixpanel, Amplitude, or Secoda to create and store a dictionary of events. Such tools let you consolidate a list of analytics events related to user activity and attributes.
The downside was always the need to manually maintain it and keep it up to date. And by “manually” I literally mean... manual entries. In 2024.
We used GitHub or GitLab to store SQL and Python and document any changes or variations done to the code logic. It helps to keep track of changes and versions, providing the best framework for a review process.
The downside was that such technical documentation wasn’t leveraged by business or product analytics teams who do most of their analysis outside of data lakes and dashboards. There was always a gap between “This is the metric definition we have” vs. “This is how we need the metric to report”. As a result, we often had to duplicate our SQL in spreadsheets, wiki, etc.
We used Jupiter Notebooks and Collabs to maintain analysis, forecasting, and modeling. I love both of them for how easy and intuitive it is to keep, retrieve, and present your analysis. It “marries” code with results.
The downside was maintaining versions and drafts and the pain of collaborating on projects with other analysts.
We stored product specs and history documentation at Confluence, Asana, Notion, and Clickup. The biggest advantage was that everyone could easily access and contribute to these.
The constant downside was that it was not meant to serve as technical documentation.
I am convinced that the challenge of maintaining and adopting documentation in analytics is the hardest one to solve because analytics is a very cross-functional discipline. It requires fitting into technical, business, and product domains at the same time.
Keep reading with a 7-day free trial
Subscribe to Data Analysis Journal to keep reading this post and get 7 days of free access to the full post archives.