Search Indexes in BigQuery

ยท

2 min read

Here's something that might be interesting if you analyze large volumes of STRING or JSON data in BigQuery.

Let's look at SEARCH indexes and what you need to know to get started.

So, what are they used for?

You have a big table (>10 GB) with STRING or JSON columns which you perform text analysis on.

Search indexes can help retrieving data more efficiently from unstructured/semi-structured data - columns of type STRING, JSON, ARRAY of STRING, STRUCTS with STRING or JSON columns.

It can help optimize the usage of SEARCH funciton as well as other operators you use with string fields like 'STARTS_WITH', 'IN', '=' or 'LIKE'.

How to create one?

CREATE SEARCH INDEX term_search_index ON dataset.table_name(ALL COLUMNS/1 or more columns);

Based on what columns you've indexed, you can leverage the search index to search the entire table (columns with the compatible datatypes) or just a subset of columns of interest.

How to know if an search index is used?

Check the 'Job Information' of your BigQuery. This will tell if you if an index was used, and if not, what was the reason.

No alt text provided for this image

Further reading

Check out text analyzer options to see what different use case you can cover better. Maybe a future post about this ๐Ÿ˜

Words of caution

- works best when you have a lot of distinct values (high query selectivity)
- if you've indexed all the columns any new compatible (STRING, JSON) column in that table will be indexed as well

Found it useful? Check out to my Analytics newsletter at notjustsql.com.

Did you find this article valuable?

Support Constantin Lungu by becoming a sponsor. Any amount is appreciated!

ย