Joining Two Tables That Are Not Related By A Foreign Key Relationship - Issue 119
How to connect two tables that are not related by a foreign key relationship in PostgreSQL
This publication was created after many late hours spent trying to figure out a way to merge two datasets into one consolidated data source for a trickly dashboard chart. It was back in 2015 (wow, time flies) for a popular (at the time) dashboarding tool called Chartio.
A few weeks ago I received a cry for help from one of my readers who was stuck with the same problem I did back then. So today I decided to repost this article to share my solution which is not very intuitive for analysts, but is also simple and saves the day. This challenge is common in analytics, and the proposed workaround is a systematic approach in analytics engineering.
This post will focus on SQL joins and will show you a way to connect two tables that are not related by a foreign key relationship.
As you may know, joins are used to combine or merge data from different tables. The way you combine them depends on the type of join you use. In order to join two or more tables, they have to be related by a key; if not, it’s not easy to join them.
Theoretically, you wouldn’t run into this case at all, because the whole point of doing a join is to extract data that has a specific relationship between tables. If there is no relationship, why would you need to join?
The challenge
Here’s one scenario: let’s say you receive payment data via API from third parties, like Stripe, PayPal, ApplePay, GooglePay, etc. Every payment provider has its own list of events and transactions, but there is no common ID or unifier between all of them. You have to merge providers to report on total paying users, new transactions, cancellations, and so on. But the looming question is if there is no common ID or USER_ID or anything unifying between these payment providers, how will you join these datasets?
Here’s another scenario: data analysts often work with different visualization tools that are piping data from a database and a universe of sources, including all flavors of analytic tools, applications, and CSV files. Merging sources using common events or IDs between them is not always a luxury we can have. Depending on the tool, this is usually solvable with a degree of complexity, but despite my obviously stunning knowledge of all things analytics, I couldn’t figure out a way to do that in Chartio.
Keep reading with a 7-day free trial
Subscribe to Data Analysis Journal to keep reading this post and get 7 days of free access to the full post archives.