Why you should care about partition pruning in BigQuery

When it comes to performance improvements and cost savings, handling only as much data as we need is very important. And partitioning is a cornerstone here.

Now, when working with tables that are partitioned, BigQuery tries to exclude the partitions it does not need (akin to pruning a tree) based on filters (WHERE clause) and JOINS, thus saving you time and processing power (and money).

But it's not always that simple. If you perform operations on the partitioned field (say the date field in a date-partitioned table), Big Q might not be able to prune the table accordingly. So you'll end up processing the entire massive table, even though you were only after one single day.

There's an example below with this happening when converting the date to a different timezone, but I've seen it happen with other operations. Pruning would not work in MERGE statement sourced from two UNIONed partitioned tables.

Check the number of rows read from the table in the examples below.

No alt text provided for this image