Data Engineers do a lot of Data Analysis work, too.
For example: we need to understand why is there a difference between two approaches or data in two source systems.
I've previously shown how a FULL OUTER JOIN in SQL with simple validation and how you can ORDER BY an expression.
Let's look at a scenario where we can combine the two.
So we've got two source systems A and B, which in theory should provide the same sales figures, but they don't.
Understanding the difference between the two sources means identifying individual cases where the values are different, starting with the biggest discrepancies.
In the example below, we're going to order the results by the absolute value (ABS) of the difference between the figures in two systems, considering a missing value as 0.
This way, we can start our investigation from the biggest differences, regardless of which system shows 'bigger' values and also take into account missing values between the two.
Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.