UNNESTING ARRAYS in BigQuery

Here's perhaps my favorite feature in BigQuery and another one I discovered when switching from SQL Server. It's one of its most powerful features - the support for ARRAYS.

Although a bit intimidating when seeing them for the first time, they allow for efficient storing and modelling one-to-many relationships, and have deep performance implications.

In other database systems that don't support arrays, you'd have to store them in a separate table or resort to workarounds like storing in list-like strings or JSON.

Working with them is quite straightforward once you get the hang of it. Probably the most common operation to do with them is to UNNEST them.

The ARRAY is basically a set of rows inside one of the columns. In the example below each of the members have a list of activities they're signed up for, together with the date they enrolled.

While we only have 4 members (and consequentially 4 rows) , each member can have 0, 1 or multiple activities they're subscribed to.

If we want to perform operations (say find out what was the earliest enrollment date, or count the distinct activities that the members are enrolled in), we'd need to unpack these rows by UNNESTing the ARRAY where activities are stored it.

This will bring us from 1 (table) row per member to 1 row per each activity a member is enrolled in.

Pay attention here to how we're joining the UNNEST - this will determine if we keep or not the members that don't have any activities.

LEFT JOIN = keep them
CROSS JOIN / , (also a cross join) / INNER JOIN = exclude them

Do remember to give the UNNESTed items a proper logical name i.e. if you're UNNESTING activities, call it activity for better readability.

No alt text provided for this image

Did you find this article valuable?

Support Constantin Lungu by becoming a sponsor. Any amount is appreciated!