Riding on the back of recent news that BigQuery table partition limit has just increased from 4k to 10k partitions, I wanted to talk a bit about the PARTITIONS
view in INFORMATION_SCHEMA
.
I've previously posted about information schema views, but this particular view allows us to get information about partitions in our partitioned tables.
Here's what we can find there:
- total rows in that partition
- logical & billable bytes for that partition
- storage tier (ACTIVE if modified in the last 90 days, LONG_TERM otherwise which is 50% cheaper)
- last modified time
Now, let's focus this last modified time
as it is quite useful when building incremental SQL pipelines. Looking at this field could tell you if data in one of your many partitions was changed since your last run and needs to be reprocessed.
Such a feature should help you in cases when you don't have a reliable watermark column to determine what changed since your last run.
Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.