Computing a hash aggregation in BigQuery

Photo by Tomas Sobek on Unsplash

Computing a hash aggregation in BigQuery

So I've seen Snowflake has an HASH_AGG function. When would we need it?

Every time we'd like to work out if ANY value in a group (or the entire table) has changed in any way, even a single extra blank space.

While BigQuery does not have it yet, we can still simulate it using the tools at hand.

Here's how we can do it:
- TO_JSON_STRING to create a STRUCT from each entire row (or create a STRUCT containing only the columns you care about)
- STRING_AGG to aggregate all the json strings into a single value per group (or the entire table)
- FARM_FINGERPRINT, a hashing function that will product the same output given only the exact same input, check my comment for more info

No alt text provided for this image

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.