Pay attention to this when UNNESTING in BigQuery

Here's a common confusion that I encounter while working with ARRAYs in BigQuery. This can lead to wrong queries in some situations. Consider the following two snippets:

FROM table,
UNNEST(repeated_column) AS single_item

versus


FROM table
LEFT JOIN UNNEST(repeated_column) AS single_item

They are very different.

In the first, the comma ',' essentially behaves like a CROSS JOIN, pairing each row of the table with every element of the array in its repeated_column.

In the second, we're using a LEFT JOIN, ensuring retention of all rows in the table, even those without a value in the repeated_column.

How is this important, you'll ask. Consider the following case.

Imagine you'd need to compute the number of unique banks a person is a customer of in the example below.

No alt text provided for this image

A person can be a customer of one, multiple or no bank at all. This becomes important if a customer has no accounts, using UNNEST joined with the comma (CROSS JOIN) would exclude these entries.

SELECT 
  person_id, 
  COUNT(DISTINCT bank_account.bank_name) AS count_distinct_banks 
FROM input_data,
UNNEST(bank_accounts) AS bank_account 

GROUP BY person_id

If we want to keep these entries, we'd need to use LEFT JOIN.

SELECT 
person_id, 
COUNT(DISTINCT bank_account.bank_name) AS count_distinct_banks 
FROM input_data

LEFT JOIN UNNEST(bank_accounts) AS bank_account 
GROUP BY person_id

In this case, the first approach would miss out on the people without a bank account, while the second would include them. Quite a big difference in query results if you ask me.

Have fun writing good SQL!

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.

Did you find this article valuable?

Support Constantin Lungu by becoming a sponsor. Any amount is appreciated!