How do you compute the MODE (most frequent value) in BigQuery?
For the other measures of central tendency like MEAN and MEDIAN, there are straightforward ways to compute results - functions AVG and PERCENTILE_CONT/PERCENTILE_DIST respectively, but there's no dedicated function for MODE.
By the way, if you have a huge dataset and can bear some lack of precision, take a look at APPROX_TOP_COUNT.
Say we have the following input data:
Now here's how we can compute them otherwise:
- filter out NULLS (if we want to ignore them) or do nothing if we want to keep them
- compute value counts for our desired grain
- take the most frequent one per our grain using QUALIFY + RANK
Here's how the output would look with NULLS excluded.
And with them included:
Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.