Skip to main content

Command Palette

Search for a command to run...

Computing a hash aggregation in BigQuery

Updated
1 min read
Computing a hash aggregation in BigQuery
C

Senior Data Engineer • Contractor / Freelancer • GCP & AWS Certified

So I've seen Snowflake has an HASH_AGG function. When would we need it?

Every time we'd like to work out if ANY value in a group (or the entire table) has changed in any way, even a single extra blank space.

While BigQuery does not have it yet, we can still simulate it using the tools at hand.

Here's how we can do it:
- TO_JSON_STRING to create a STRUCT from each entire row (or create a STRUCT containing only the columns you care about)
- STRING_AGG to aggregate all the json strings into a single value per group (or the entire table)
- FARM_FINGERPRINT, a hashing function that will product the same output given only the exact same input, check my comment for more info

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.

More from this blog

D

Datawise: A blog on SQL, BigQuery, Analytics, Python Tips and Tricks

204 posts

Data Engineer with a passion for transforming complex data landscapes into insightful stories. Here on my blog, I share insights, challenges, and the ever-evolving dance of technology and business.