One of the most common sightings in SQL code is using ranking functions. It's simple but we must surely get it right. I use ROW_NUMBER
very often for de-duplication, also used DENSE_RANK
a couple of times - but I've never used RANK
. So how are they different?
- ROW_NUMBER - sequential row number, regardless of equal values. If no order is provided, the results might be different every time you run it (aka non-deterministic): 1,2,3,4,5
- DENSE_RANK - also a sequential row number, but takes into account peer rows (with equal values). next rank is the immediate following: 1,2,2,3 (no gaps)
- RANK - also a sequential row number, takes into account rows with equal values, but next rank is incremented: 1,2,2,4 (with gaps)
See below for an example of how it all works.
SELECT
store_code,
country,
sales_usd,
ROW_NUMBER() OVER country_sales AS row_no,
DENSE_RANK() OVER country_sales AS dense_rnk,
RANK() OVER country_sales AS rnk
FROM input_data
WINDOW country_sales AS (PARTITION BY country ORDER BY sales_usd DESC)
Thanks for reading!
Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.