Skip to main content

Command Palette

Search for a command to run...

Sharded tables in BigQuery

Updated
1 min read
Sharded tables in BigQuery
C

Senior Data Engineer • Contractor / Freelancer • GCP & AWS Certified

Have you ever worked with sharded tables in BigQuery?

I've encountered them in a project long time ago and haven't seen them much around since.

Does the name not ring a bell?

Well, think of it as pseudo-partitioning, a way to store data split between different tables, each having a different suffix in the name dataset.table_name_{your_suffix}.

We'll be getting different tables, but they can be queried as one by using a wildcard *, retrieving data from all the tables matching the wildcard, like having an invisible UNION ALL behind the scenes.

In practice, I've seen these suffixes most of the time being dates like YYYYMMDD. So, in this case, BigQuery docs discourage this usage of sharding, citing the overhead in terms of storing a separate schema and metadata + permission checks as compared to just using a date-partitioned table.

So you're just better off to use a partitioned table in this case.

They even offer a quick way to convert a group of date-sharded tables to a regular date-partitioned table.

Have you ever encountered any interesting use cases for sharding?

Found it useful? Check out to my Analytics newsletter at notjustsql.com.

Practical SQL

Part 1 of 50

Short, practical posts on SQL and BigQuery — from core language features to advanced query patterns. A reference for data practitioners at every level.

More from this blog

D

Datawise — SQL, BigQuery & Python for Data Engineers

206 posts

Data Engineer with a passion for transforming complex data landscapes into insightful stories. Here on my blog, I share insights, challenges, and the ever-evolving dance of technology and business.