Skip to main content

Command Palette

Search for a command to run...

Using RANGE_BUCKET in BigQuery

Updated
1 min read
Using RANGE_BUCKET in BigQuery
C

Senior Data Engineer • Contractor / Freelancer • GCP & AWS Certified

I've recently had to perform a validation of differences between 2 data sources, so I've figured it would be interesting to see the distribution of absolute differences between the two (how big are they + how often they happen).

I've used RANGE_BUCKET function in BigQuery SQL. So what does it do?

It takes a value and an array. The value is the one you want to find a bucket for, whereas the array are the buckets intervals you want to group the values in.

Something like value = 1, bucket bounds [0, 5, 10, 100, 1000]

This will get you the index of the next larger value in the array, in other words, the index of your group.

A couple of special cases:
- if your value is smaller than the first bound, it gets assigned to bucket 0
- if it's NULL, the bucket is also NULL

One can of course do the same with a CASE WHEN statement, but this way looks pretty concise. I've combined it with a COUNT / GROUP BY to assess how big where the differences in my analyzed dataset.

Check out a representative example below.

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.

More from this blog

D

Datawise: A blog on SQL, BigQuery, Analytics, Python Tips and Tricks

204 posts

Data Engineer with a passion for transforming complex data landscapes into insightful stories. Here on my blog, I share insights, challenges, and the ever-evolving dance of technology and business.