# BigQuery BigLake Tables Explained: What They Are and When to Use Them

If you've worked with BigQuery [external tables](https://cloud.google.com/bigquery/docs/external-tables) before, you know the basic idea: a thin wrapper around data that resides somewhere else, but queryable from BigQuery. Sources include Cloud Storage, [Google Sheets](https://datawise.dev/importing-google-sheets-into-bigquery), or Google Drive.

Today I'd like to talk about a special variety of external table: the **BigLake table**. It's built to bridge data lakes and data warehouses.

### BigQuery External Tables: Limitations and Pain Points

With a regular external table, users need access both to the BigQuery table *and* to the underlying external data source.

If the data is in Cloud Storage, a user needs BigQuery permissions, but also permissions on the bucket and objects. This might be fine for a quick exercise, but the pain points are real:

*   you manage permissions in multiple places, for different types of resources
    
*   bucket-level access can be too broad
    
*   it's harder to apply table-like governance on files
    

### How BigLake Tables Work: Access Delegation Explained

BigLake tables use a BigQuery Connection that accesses the files on behalf of the users. Users don't need direct access to the underlying buckets.

This enables table-level security on external data, including:

*   row-level security
    
*   column-level security
    
*   dynamic data masking (for Cloud Storage BigLake tables)
    

BigLake also enables — powered by BQ Omni — reading data from Amazon S3 and Azure Blob Storage.

### Metadata Caching in BigLake Tables: Faster Queries, Better Plans

When querying external data, BigQuery needs to inspect the files first: what files exist, how they are partitioned, and what metadata they contain. With classic external tables, every query triggers a listing operation against the underlying storage.

If you have a small number of files, this is barely noticeable. If you have thousands or millions of files, especially Hive-partitioned data, it becomes painful.

BigLake tables unlock metadata caching. With it enabled, BigQuery skips the listing on every query and prunes files and partitions faster — avoiding reading unneeded files altogether.

For Parquet BigLake tables, metadata caching also collects table statistics, which helps the optimizer produce better query plans.

The cache has a staleness window you control, and you choose between automatic or manual refreshes (the manual option runs a stored procedure, useful if you want to make it event-driven).

### Creating a BigLake Table: Step-by-Step SQL Example

Say we have some sales data in Google Cloud Storage as follows

![](https://cdn.hashnode.com/uploads/covers/641c1535429c76261884ecba/60486a93-b5b8-44ba-a579-8304f73dcd42.png align="center")

It is Hive partitioned Date -> Country.

![](https://cdn.hashnode.com/uploads/covers/641c1535429c76261884ecba/d755dd31-88e4-48e8-a5f9-ac9c667d1970.png align="center")

In order to create a BigLake table we need to do the following:

Create a BQ Connection

![](https://cdn.hashnode.com/uploads/covers/641c1535429c76261884ecba/33f54513-24a6-499a-89e7-3ee758ace597.png align="center")

![](https://cdn.hashnode.com/uploads/covers/641c1535429c76261884ecba/af156ccc-042a-4608-97f5-e1cf619d0039.png align="center")

![](https://cdn.hashnode.com/uploads/covers/641c1535429c76261884ecba/c4db9544-0deb-4897-b43e-70949a7ed853.png align="center")

Now, we need to grant this service account access to the GCS bucket

![](https://cdn.hashnode.com/uploads/covers/641c1535429c76261884ecba/559ecabd-1711-47ca-8700-bc7de5376cfb.png align="center")

We can now create the BigLake table:

```sql
CREATE EXTERNAL TABLE `learning.orders_biglake`
(
  order_id STRING,
  customer_id INT64,
  channel STRING,
  amount NUMERIC,
  discount_amount NUMERIC,
  created_at TIMESTAMP
)
WITH PARTITION COLUMNS
(
  order_date DATE,
  country STRING
)
WITH CONNECTION `projects/your-gcp-project/locations/eu/connections/demo-biglake-connection`
OPTIONS (
  format = 'CSV',
  skip_leading_rows = 1,
  field_delimiter = ',',
  hive_partition_uri_prefix = 'gs://datawise-biglake-hive-demo-bucket/orders',
  uris = ['gs://datawise-biglake-hive-demo-bucket/orders/*'],
  max_staleness = INTERVAL 1 DAY,
  metadata_cache_mode = 'AUTOMATIC'
);
```

A few things to note:

*   max\_staleness — how old the cache can be before BigQuery goes back to storage. Minimum is 15 minutes
    
*   metadata\_cache\_mode — AUTOMATIC refreshes on its own; MANUAL lets you trigger it via stored procedure, useful for event-driven pipelines
    
*   consider adding require\_partition\_filter to force callers to filter on a partition key and avoid full file scans
    

The table is now created and can be queried like any other BigQuery table.

![](https://cdn.hashnode.com/uploads/covers/641c1535429c76261884ecba/ed0dcfb0-6de1-4996-ac7f-a6793414b042.png align="center")

### **Before you go**

A couple of things worth knowing:

*   an existing classic external table can be upgraded to a BigLake table without recreating it from scratch
    
*   metadata cache refreshes incur processing costs — worth keeping in mind if you're dealing with a large number of files
