Scheduled queries in BigQuery

ยท

3 min read

What are Scheduled Queries?

Scheduled Queries in BigQuery allow users to run SQL tasks on a predefined, automated schedule.

Instead of manually initiating a query each day or week, BigQuery can do it for you. This is perfect for light orchestration of repetitive data transformations, updates, and regular reporting tasks.

Creating a Scheduled Query manually

A scheduled query can be created manually using the Schedule button in the Query window.

A dialog box is presented where we are prompted to provide the following information:

  • a name for the scheduled query

  • a scheduling option or the option to run it manually (on demand)

  • start date and end date for which this query would run

  • optionally, a destination dataset and table for the query result (including the possibility of adding a partitioning field)

  • region selection

  • encryption options

  • principal (user or service account) to run the query under

  • write disposition, in order words what to do with the contents already in the table: keep (and append new data) or overwrite (and replace with the new data)

  • notification options

Creating a scheduled query using Terraform

A schedule can also be created using Terraform, as per the following example.

This option provides a subset of options from the manual method - at the time of writing, for example, the write disposition cannot be set up here.

variable "envConfig" {

    type = map(object({
        project_id = string
        service_account_name = string
    }))
}

variable "config" {
    type = map(object({
        name = string
        query = string
        destination_table_name_template = string
    }))
}
resource "google_bigquery_data_transfer_config" "query_config" {
    for_each = var.config
    display_name = each.value["name"]
    location = "europe"
    data_source_id = "scheduled_query"
    schedule = "every saturday 05:00"
    destination_dataset_id = "learning"
    params = {
        destination_table_name_template = each.value["destination_table_name_template"]
        write_disposition = "WRITE_APPEND"
        query = each.value["query"]
    }
    service_account_name = var.envConfig["dev"].service_account_name
    project = var.envConfig["dev"].project_id
}

Creating a Scheduled Query using the API

It's also possible to create a scheduled query using one of the BigQuery APIs or the bq CLI command - check the GCP documentation here.

Viewing scheduled queries

Viewing the scheduled query can be done by Accessing the 'Scheduled queries' option in the BigQuery subgroup.

It will display a list of queries, their schedule, region, destination (if any) and next run time.

Click on a particular query would present a Run history and its output. There is also an option for scheduling a backfill (which we can use for a Manual Run).

Clicking on the Configuration tab would display the configurations used to create the query.

In conclusion, BigQuery Scheduled queries can be a useful tool in your toolset, as a quick and easy way to do light orchestration of SQL tasks.

Thanks for reading!


๐Ÿš€ Delving into BigQuery, Google Cloud and Analytics? Stay connected with me for valuable insights, handy tips, and strategies to effortlessly traverse the vast universe of cloud computing and data analytics!

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.

Did you find this article valuable?

Support Constantin Lungu by becoming a sponsor. Any amount is appreciated!

ย