Have you ever worked with sharded tables in BigQuery?
I've encountered them in a project long time ago and haven't seen them much around since.
Does the name not ring a bell?
Well, think of it as pseudo-partitioning, a way to store data split between different tables, each having a different suffix in the name dataset.table_name_{your_suffix}
.
We'll be getting different tables, but they can be queried as one by using a wildcard *, retrieving data from all the tables matching the wildcard, like having an invisible UNION ALL behind the scenes.
In practice, I've seen these suffixes most of the time being dates like YYYYMMDD. So, in this case, BigQuery docs discourage this usage of sharding, citing the overhead in terms of storing a separate schema and metadata + permission checks as compared to just using a date-partitioned table.
So you're just better off to use a partitioned table in this case.
They even offer a quick way to convert a group of date-sharded tables to a regular date-partitioned table.
Have you ever encountered any interesting use cases for sharding?
Found it useful? Check out to my Analytics newsletter at notjustsql.com.