Cleaning up STRINGS in BigQuery

Data is collected and processed in a number of ways, and it should come as no surprise that it's not always perfect.

Perhaps the most important thing you need to do before analyzing data is have a look at how it's presented and check for irregularities.

Before any sound analysis a great deal of attention needs to be paid to cleaning the data.

Take string columns for instance. In hashtag#BigQuery, as with other engines, there is a wealth of functions helping you to process strings, including:

- TRIM/RTRIM/LTRIM for getting rid of the whitespace
- REPLACE to replace a substring with another one
- UPPER/LOWER/NORMALIZE etc to control casing
- SUBSTR/SUBSTRING to cut strings and so on.

The main goal here is to bring everything to a common denominator, being able to tell which observations belong together and which data can be considered "missing".

No alt text provided for this image

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.