Skip to main content

Command Palette

Search for a command to run...

Why you should use UNION DISTINCT sparingly

Updated
1 min read
Why you should use UNION DISTINCT sparingly

Let's help BigQuery do less unneeded work!

If you're UNIONING two sources known to have distinct values (and they don't have duplicates), go for UNION ALL instead of UNION DISTINCT (UNION for some other sql dialects) to avoid redundant de-duplication.

In the example below, I've unioned two Google Trends tables - one that is only for US terms and another one for the rest of the world. Since one table only contains US and the other everything except the US, we know the union of the two tables to be distinct from the start, thus not needing the UNION DISTINCT.

There's no difference indeed for on-demand pricing (same amount of data scanned), but quite a difference for capacity pricing users ( 1/2 of slot usage).

So use UNION DISTINCT (and any other DISTINCT) sparingly and when you actually need it.

Found it useful? Check out to my Analytics newsletter at notjustsql.com.


Enjoyed this? Here are some related articles you might find useful:

Practical SQL

Part 1 of 50

Short, practical posts on SQL and BigQuery — from core language features to advanced query patterns. A reference for data practitioners at every level.

More from this blog

D

Datawise: A blog on SQL, BigQuery, Analytics, Python Tips and Tricks

204 posts

Data Engineer with a passion for transforming complex data landscapes into insightful stories. Here on my blog, I share insights, challenges, and the ever-evolving dance of technology and business.