Skip to main content

Command Palette

Search for a command to run...

Calculating the Median in BigQuery

Updated
1 min read
Calculating the Median in BigQuery
C

Senior Data Engineer • Contractor / Freelancer • GCP & AWS Certified

One of these days, I had to handle missing value imputation and stumbled upon the need to calculate a median in BigQuery SQL. Since there is no built-in function for this, I looked for what other people used as workarounds—see sources in comments.

First, we have PERCENTILE_CONT and PERCENTILE_DISC (from continuous and discrete, respectively). The difference between them lies in whether interpolation is used:
- PERCENTILE_CONT uses linear interpolation. In the case of an even number of values, it returns their average.
- PERCENTILE_DISC selects the closest value without any interpolation.
Both of these are window functions, so if you want to simulate grouping, you need to ensure that a single value is kept per group. Also, note the option of IGNORE | RESPECT NULLS (ignore is the default).

Additionally, we can use the approximate aggregation function APPROX_QUANTILES, which allows grouping. This function splits the values into quantiles, from which we can select the 50th percentile to retrieve the median. Check out the comments for a quick into intro approximate aggregate functions.

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.

More from this blog

D

Datawise: A blog on SQL, BigQuery, Analytics, Python Tips and Tricks

204 posts

Data Engineer with a passion for transforming complex data landscapes into insightful stories. Here on my blog, I share insights, challenges, and the ever-evolving dance of technology and business.

BigQuery Median: PERCENTILE_CONT vs APPROX_QUANTILES