Skip to main content

Command Palette

Search for a command to run...

BigQuery BigLake Tables Explained: What They Are and When to Use Them

Query your data lake with warehouse-grade security and performance — without moving a single file.

Published
4 min read
BigQuery BigLake Tables Explained: What They Are and When to Use Them
C

Senior Data Engineer • Contractor / Freelancer • GCP & AWS Certified

If you've worked with BigQuery external tables before, you know the basic idea: a thin wrapper around data that resides somewhere else, but queryable from BigQuery. Sources include Cloud Storage, Google Sheets, or Google Drive.

Today I'd like to talk about a special variety of external table: the BigLake table. It's built to bridge data lakes and data warehouses.

BigQuery External Tables: Limitations and Pain Points

With a regular external table, users need access both to the BigQuery table and to the underlying external data source.

If the data is in Cloud Storage, a user needs BigQuery permissions, but also permissions on the bucket and objects. This might be fine for a quick exercise, but the pain points are real:

  • you manage permissions in multiple places, for different types of resources

  • bucket-level access can be too broad

  • it's harder to apply table-like governance on files

How BigLake Tables Work: Access Delegation Explained

BigLake tables use a BigQuery Connection that accesses the files on behalf of the users. Users don't need direct access to the underlying buckets.

This enables table-level security on external data, including:

  • row-level security

  • column-level security

  • dynamic data masking (for Cloud Storage BigLake tables)

BigLake also enables — powered by BQ Omni — reading data from Amazon S3 and Azure Blob Storage.

Metadata Caching in BigLake Tables: Faster Queries, Better Plans

When querying external data, BigQuery needs to inspect the files first: what files exist, how they are partitioned, and what metadata they contain. With classic external tables, every query triggers a listing operation against the underlying storage.

If you have a small number of files, this is barely noticeable. If you have thousands or millions of files, especially Hive-partitioned data, it becomes painful.

BigLake tables unlock metadata caching. With it enabled, BigQuery skips the listing on every query and prunes files and partitions faster — avoiding reading unneeded files altogether.

For Parquet BigLake tables, metadata caching also collects table statistics, which helps the optimizer produce better query plans.

The cache has a staleness window you control, and you choose between automatic or manual refreshes (the manual option runs a stored procedure, useful if you want to make it event-driven).

Creating a BigLake Table: Step-by-Step SQL Example

Say we have some sales data in Google Cloud Storage as follows

It is Hive partitioned Date -> Country.

In order to create a BigLake table we need to do the following:

Create a BQ Connection

Now, we need to grant this service account access to the GCS bucket

We can now create the BigLake table:

CREATE EXTERNAL TABLE `learning.orders_biglake`
(
  order_id STRING,
  customer_id INT64,
  channel STRING,
  amount NUMERIC,
  discount_amount NUMERIC,
  created_at TIMESTAMP
)
WITH PARTITION COLUMNS
(
  order_date DATE,
  country STRING
)
WITH CONNECTION `projects/your-gcp-project/locations/eu/connections/demo-biglake-connection`
OPTIONS (
  format = 'CSV',
  skip_leading_rows = 1,
  field_delimiter = ',',
  hive_partition_uri_prefix = 'gs://datawise-biglake-hive-demo-bucket/orders',
  uris = ['gs://datawise-biglake-hive-demo-bucket/orders/*'],
  max_staleness = INTERVAL 1 DAY,
  metadata_cache_mode = 'AUTOMATIC'
);

A few things to note:

  • max_staleness — how old the cache can be before BigQuery goes back to storage. Minimum is 15 minutes

  • metadata_cache_mode — AUTOMATIC refreshes on its own; MANUAL lets you trigger it via stored procedure, useful for event-driven pipelines

  • consider adding require_partition_filter to force callers to filter on a partition key and avoid full file scans

The table is now created and can be queried like any other BigQuery table.

Before you go

A couple of things worth knowing:

  • an existing classic external table can be upgraded to a BigLake table without recreating it from scratch

  • metadata cache refreshes incur processing costs — worth keeping in mind if you're dealing with a large number of files