Here's how GROUP BY works differently across SQL dialects

ยท

2 min read

It turns out I've been writing longer queries than I should.๐Ÿ˜ Here's an interesting gotcha about GROUP BY between SQL dialects that I've just learned.

So I've started with SQL Server back in the day, where according to docs:

a GROUP BY column expression cannot be "a column alias that is defined in the SELECT list".

Naturally, if I processed a column during aggregation, I would have the same expression in GROUP BY (minus the alias of course).

SELECT 
CASE WHEN country in ('US','USA','US of A') THEN 'USA' ELSE country END AS country, SUM (sales_amount) AS total_sales
FROM sales
GROUP BY CASE WHEN country in ('US','USA','US of A') THEN 'USA' ELSE country END

Where it matters: if the alias is the same as the original column name, you would be grouping not by the 'transformed' column, but by the original one, yielding things you might not expect ๐Ÿ˜ A newly-assigned alias cannot be grouped by for the same reason.

Well, things are different with BigQuery for instance. Docs mention:

GROUP BY clauses may also refer to aliases. If a query contains aliases in the SELECT clause, those aliases override names in the corresponding FROM clause.

So in BQ, you can reference the alias you've assigned in SELECT (overriding the one from FROM if matching) and you can reference a newly aliased column. No need to copy the unwieldy CASE WHEN ... to the GROUP BY in this case.

SELECT 
 country AS cntry,
 SUM(amount) AS total_amount
FROM input_data
GROUP BY cntry

Lesson learned (for now).

No alt text provided for this image

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.

Did you find this article valuable?

Support Datawise by becoming a sponsor. Any amount is appreciated!

ย