Filling up missing values with LAST_VALUE

Window functions are powerful. But they can also help us fill in missing data in BigQuery.

Let's say you have a sensor that records temperature and humidity. Unfortunately, it is quite unreliable, so sometimes it might not send one or both readings. You'd like to retain the last known reading for a measurement.

Here's how we can solve it:

- Leverage the LAST_VALUE window function.
- Specify the IGNORE NULLS clause
- Partition by sensor_id so only we consider data from the same sensor
- Order the window by the timestamp column
- Define a ROWS condition to consider rows from the beginning of time up to and including the current row - this way, if we do have a current reading for this timestamp, we keep it.

Here's how it would look in SQL:

  SELECT


  sensor_id,
  temperature,
  humidity_percentage,
  at_timestamp,

  LAST_VALUE(temperature IGNORE NULLS) OVER (PARTITION BY sensor_id ORDER BY at_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS tr_temperature,

  LAST_VALUE(humidity_percentage IGNORE NULLS) OVER (PARTITION BY sensor_id ORDER BY at_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS tr_humidity,

FROM input_data

ORDER BY at_timestamp

Previously, I've written another blog post solving a similar problem by leveraging NTH_VALUE .

P.S. This would not work if you're trying to fill in a STRUCT for example. IGNORE NULLS does not regard STRUCT(NULL AS a, NULL AS b) the same as NULL, so you might need to unpack the STRUCT.

Happy querying!

Did you find this article valuable?

Support Constantin Lungu by becoming a sponsor. Any amount is appreciated!