# De-duplicating with ROW_NUMBER vs ARRAY_AGG


What function do you use to explicitly de-duplicate in BigQuery?  
I normally use ROW\_NUMBER(), but I've recently encountered a really interesting [blog post](https://cloud.google.com/blog/topics/developers-practitioners/bigquery-admin-reference-guide-query-optimization) suggesting ARRAY\_AGG might be more performant for the task.

The explanation given is that the `ORDER BY` is allowed to drop everything except the top record on each GROUP BY, making ARRAY\_AGG more efficient.

Sure enough, I did give it a try on some sample data.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1701523417203/d85c84a8-7229-4fc5-a522-3392e47ea44a.png align="center")

In the below example, we'd like to pick the latest date available per id. The `ROW_NUMBER` example is pretty straightforward - we partition by id and order by `ds_date` decreasingly, then use the `QUALIFY` clause to keep only the record we want.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1701523015713/950e8e52-6af0-4581-9515-eb0c05427db5.png align="center")

The `ARRAY_AGG` example, while looking a bit more intimidating, does the same thing.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1701523032907/895ddea1-3efa-48ab-8c1a-95b21c37f1b5.png align="center")

It turns out that the recommendation holds - slot time for the ARRAY\_AGG version was only 40% of the ROW\_NUMBER. Another day, another lesson learned.

*Found it useful? Subscribe to my Analytics newsletter at* [*notjustsql.com*](https://www.notjustsql.com)*.*

---

*Enjoyed this? Here are some related articles you might find useful:*

- [Beware of ROW_NUMBER without ORDER BY](https://datawise.dev/beware-of-rownumber-without-order-by)
- [Tidying up WINDOW functions in BigQuery with named windows](https://datawise.dev/tidying-up-window-functions-in-bigquery-with-named-windows)
- [Using RANGE in Window Functions in BigQuery](https://datawise.dev/using-range-in-window-functions-in-bigquery)
- [Computing a cumulative sum in BigQuery](https://datawise.dev/computing-a-cumulative-sum-in-bigquery)

