Combining STRUCTs with Window Functions in BigQuery

Photo by Alain Pham on Unsplash

Combining STRUCTs with Window Functions in BigQuery

How often do you use STRUCTs in BigQuery? I do a lot, and here's an interesting use case.

First, what is a STRUCT?

So, if you're not familiar with them, a STRUCT is a data type used to represent an object, allowing us to group related fields within one data cell.

Think of it as the ability to store complex 'things' inside a BigQuery row, in addition to the 'primitive' types like INT64 or STRING. So you can have a STRUCT 'person' that has attributes like name, age and salary. This can of course be REPEATED to obtain an ARRAY of person STRUCTs.

Now, STRUCTs are useful for many things, one of which I'm going to present today.

Here is a trick I use when working with window functions like LEAD and LAG that involve STRUCTs in BigQuery.

Problem statement

Did you ever have to retrieve the historical attribute (previous or next) of an entity in a temporal table (SCD-type 2)?

Here's how an example could look.

SELECT 
    id, 
    value_int, 
    value_text, 
    valid_from, 
    valid_to,
    LEAD(value_int) OVER (PARTITION BY id ORDER BY valid_from) AS next_value_int,
    LEAD(value_text) OVER (PARTITION BY id ORDER BY valid_from) AS next_value_text,
    LAG(value_int) OVER (PARTITION BY id ORDER BY valid_from) AS prev_value_int,
    LAG(value_text) OVER (PARTITION BY id ORDER BY valid_from) AS prev_value_text

FROM `learning.input_data` 

ORDER BY id, valid_from

Okay, but what if you have a dozen attributes? Instead of writing tens of LEAD or LAG functions, leverage STRUCT and look up an entire STRUCT of attributes.

The solution

Here's an adapted example that uses STRUCTs.

WITH input_data AS (
    SELECT 
        id, 
        value_int, 
        value_text, 
        STRUCT(value_int, value_text) AS value, 
        valid_from, 
        valid_to
    FROM `learning.input_data` )
 SELECT 
     id, 
     value_int, 
     value_text, 
     valid_from, 
     valid_to,
     LAG(value) OVER (PARTITION BY id ORDER BY valid_from) AS prev_value,
     LEAD(value) OVER (PARTITION BY id ORDER BY valid_from) AS next_value
 FROM input_data

ORDER BY id, valid_from

This way, you can use LEAD or LAG only once, regardless of how many attributes you need to look up.

There are of course other interesting use cases for STRUCTs, which we will explore in upcoming posts. Stay tuned!

Did you find this article valuable?

Support Constantin Lungu by becoming a sponsor. Any amount is appreciated!