Skip to main content

Command Palette

Search for a command to run...

Comparing tables with FULL OUTER JOIN

Updated
2 min read
Comparing tables with FULL OUTER JOIN
C

Senior Data Engineer • Contractor / Freelancer • GCP & AWS Certified

Does your Data Engineering project use a data-diffing tool?

Say you're preparing to deploy a change to a prod table. You've changed the way some metrics are calculated and twisted some filters. How do you find out what's different between two tables? Identify expected vs unexpected differences?

If lacking a specialized tool for data-diffing (like Datafold, Recce ) perhaps the simplest validation you can do when comparing two tables (dev and prod versions for example) leverages the FULL OUTER JOIN (or FULL JOIN in some RDBMS).

Start with the grain. Is there any way you can bring the tables to the same grain?

Once you have aligned them to the same grain, you can now join on the respective keys and COUNT the occurrences you care about - what's missing from A, what's missing from B, totals overall. Depending on attributes, you can use other aggregate functions to assess differences - for example, does the SUM of sales amounts match in prod vs dev?

You could also leverage a hashing function + TO_JSON_ARRAY (check my previous post) to see which rows are different in the two tables.

Found it useful? Check out to my Analytics newsletter at notjustsql.com.


Enjoyed this? Here are some related articles you might find useful:

Practical SQL

Part 1 of 50

Short, practical posts on SQL and BigQuery — from core language features to advanced query patterns. A reference for data practitioners at every level.

More from this blog

D

Datawise — SQL, BigQuery & Python for Data Engineers

205 posts

Data Engineer with a passion for transforming complex data landscapes into insightful stories. Here on my blog, I share insights, challenges, and the ever-evolving dance of technology and business.