Skip to main content

Command Palette

Search for a command to run...

Aggregating Multiple SCD-2 Attribute Timelines in BigQuery

Updated
1 min read
Aggregating Multiple SCD-2 Attribute Timelines in BigQuery
C

Senior Data Engineer • Contractor / Freelancer • GCP & AWS Certified

Here’s another practical BigQuery SQL exercise 💡

Say you have an input SCD2-style table with [valid_from, valid_to) and key-value attributes. Now you want to determine which attributes were valid at the same time for a given grain (e.g. id).

To do this, we:

1️⃣ Build an anchor table of all change points (start and end dates), grouped by id.

2️⃣ Generate date ranges using LEAD() over the change points, so we know the next boundary.

3️⃣ Join back to the original table to find which rows were active within each [valid_from, valid_to) segment.

4️⃣ Aggregate the key-value pairs as ARRAY<STRUCT<key, value>> to preserve temporal context.

We can now see that, for example, for the period between [2023-01-05, 2023-01-08), for id = 2, B was true and A was false.

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.


Enjoyed this? Here are some related articles you might find useful:

More from this blog

D

Datawise: A blog on SQL, BigQuery, Analytics, Python Tips and Tricks

204 posts

Data Engineer with a passion for transforming complex data landscapes into insightful stories. Here on my blog, I share insights, challenges, and the ever-evolving dance of technology and business.