Using ARRAY_AGG in BigQuery

Almost everybody knows the usual standard SQL aggregation functions like SUM, MAX or AVG. In this short post, we're going to look at ARRAY_AGG: another useful aggregation function.

So, what does it do?

ARRAY_AGG allows us to aggregate multiple rows into a single array, based on a particular grouping. It's quite useful when modeling one-to-many relationships, like customers and orders.

For example, let's analyze the following input table.

Let's say we'd like to aggregate the order data into an ARRAY of STRUCTs, grouped by customer_id. Let's also order the resulting array decreasingly by the order_total .

The code to do that would look as follows:


SELECT 

  customer_id, 
  ARRAY_AGG( STRUCT(order_id, order_total ) ORDER BY order_total DESC) AS order_details

FROM input_data

GROUP BY customer_id

Here's how the processed data looks like:

We can now see that instead of the 6 initial rows, we have 2 rows - 1 per customer_id and an array of STRUCTS with order details.

Note that ARRAY_AGG can be combined with:
- DISTINCT, to eliminate duplicates in the resulting ARRAY
- STRUCT, to create an ARRAY of STRUCTS
- ORDER BY, to order the ARRAY in a particular way
- LIMIT, to keep only first n entries (based on the ordering)

Thanks for reading!

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.