Comparing ranking functions in BigQuery

One of the most common sightings in SQL code is using ranking functions. It's simple but we must surely get it right. I use ROW_NUMBER very often for de-duplication, also used DENSE_RANK a couple of times - but I've never used RANK. So how are they different?

- ROW_NUMBER - sequential row number, regardless of equal values. If no order is provided, the results might be different every time you run it (aka non-deterministic): 1,2,3,4,5

- DENSE_RANK - also a sequential row number, but takes into account peer rows (with equal values). next rank is the immediate following: 1,2,2,3 (no gaps)

- RANK - also a sequential row number, takes into account rows with equal values, but next rank is incremented: 1,2,2,4 (with gaps)

See below for an example of how it all works.

SELECT 
  store_code, 
  country, 
  sales_usd, 
  ROW_NUMBER() OVER country_sales AS row_no,
  DENSE_RANK() OVER country_sales AS dense_rnk,
  RANK() OVER country_sales       AS rnk

FROM input_data

WINDOW country_sales AS (PARTITION BY country ORDER BY sales_usd DESC)

Thanks for reading!

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.