Skip to main content

Command Palette

Search for a command to run...

A simple data validation scenario using FULL OUTER JOIN & ORDER BY

Updated
1 min read
A simple data validation scenario using FULL OUTER JOIN & ORDER BY
C

Senior Data Engineer • Contractor / Freelancer • GCP & AWS Certified

Data Engineers do a lot of Data Analysis work, too.

For example: we need to understand why is there a difference between two approaches or data in two source systems.

I've previously shown how a FULL OUTER JOIN in SQL with simple validation and how you can ORDER BY an expression.

Let's look at a scenario where we can combine the two.

So we've got two source systems A and B, which in theory should provide the same sales figures, but they don't.

Understanding the difference between the two sources means identifying individual cases where the values are different, starting with the biggest discrepancies.

In the example below, we're going to order the results by the absolute value (ABS) of the difference between the figures in two systems, considering a missing value as 0.

This way, we can start our investigation from the biggest differences, regardless of which system shows 'bigger' values and also take into account missing values between the two.

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.


Enjoyed this? Here are some related articles you might find useful:

Practical SQL

Part 1 of 50

Short, practical posts on SQL and BigQuery — from core language features to advanced query patterns. A reference for data practitioners at every level.

More from this blog

D

Datawise — SQL, BigQuery & Python for Data Engineers

205 posts

Data Engineer with a passion for transforming complex data landscapes into insightful stories. Here on my blog, I share insights, challenges, and the ever-evolving dance of technology and business.