Skip to main content

Command Palette

Search for a command to run...

Compacting date intervals in BigQuery

Updated
2 min read
Compacting date intervals in BigQuery
C

Senior Data Engineer • Contractor / Freelancer • GCP & AWS Certified

Here's a practical BigQuery SQL exercise that highlights some important concepts as well is an interesting algorithm imho. I've pair programmed this with LLMs, if that's a thing 😎

Problem statement: compacting a SCD-2 table, essentially finding intervals that can be safely merged, turning two adjacent intervals with the same data into a single, bigger interval.

This particular input data guarantees these intervals cannot overlap (at the same grain), but there can be gaps. We're also talking about [left-inclusive, right-exclusive) intervals.

Here's a breakdown of how it all works:

1️⃣ we're starting by computing a hash of all the column of interest, excluding the grain (in my example: flag_a, flag_b)
2️⃣ then we use LAG() over grain window to detect whether the current row starts right after the previous one and if the hashes (so the 'payload' of the two rows) match
3️⃣ we mark the start of a new "segment" when either:
- attributes have changed (flags differ), so hash being different
- intervals are not adjacent (there are gaps)

4️⃣ use a cumulative SUM() over grain window to group rows into segment IDs

5️⃣ collapse each segment using MIN(valid_from) and MAX(valid_to)

We can now see that in our example that several intervals were merged into bigger ones.

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.


Enjoyed this? Here are some related articles you might find useful:

More from this blog

D

Datawise — SQL, BigQuery & Python for Data Engineers

205 posts

Data Engineer with a passion for transforming complex data landscapes into insightful stories. Here on my blog, I share insights, challenges, and the ever-evolving dance of technology and business.