# Converting JSON to BigQuery ARRAY and STRUCT


Earlier in 2022 BigQuery introduced native support for the [JSON datatype](https://cloud.google.com/bigquery/docs/reference/standard-sql/json-data). Previously, one would have had to store the JSON data in a string column. This new development opens the door to a lot of interesting use cases, given the widespread adoption and flexibility that this format allows.

Now, what are the trade-offs one would need to consider when choosing between storing the data using the JSON datatype versus the ARRAY and STRUCT data types commonly in BigQuery? I’ve recently come across [a great blog post](https://www.letmesqlthatforyou.com/2020/05/json-vs-structs-vs-columns-in-bigquery.html) comparing these approaches.

My takeaway is that if you’d be willing to give up a little bit on the flexibility of JSON and are not afraid of working with nested data, this might be an interesting choice. There is value to be found here in storage and querying cost savings as well as in easiness of accessing and exploring the data stored. Again, this might vary greatly based on one’s use case, data shape and size.

In this short article, we’re going to do a practical exercise of converting a JSON-containing string / JSON data type into classical BigQuery structures: ARRAYS and STRUCTS.

We’re going to transform this:

%[https://gist.github.com/cnstlungu/b116901eaf9ef92edf2b55148689834a] 

into this:

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1679669944503/3b4bebb9-bee7-469f-955b-556c3cb6757d.png align="center")

For that, we’ll need to:

* extract the data using JSON\_VALUE and JSON\_EXTRACT ARRAY (with UNNEST)
    
* create a STRUCT for each JSON object and an ARRAY for each JSON array, and do so from outside to the inside
    
* nest the data, imitating the source and aliasing appropriately
    
* cast the attributes to the appropriate BigQuery datatype
    

%[https://gist.github.com/cnstlungu/cab85d88614398bc6ec30517e5147a18] 

Notice how above, when extracting an array member (***ability***) from an attribute inside a struct that is inside another array (***members***), we’re using the unnested member as the input to the JSON\_VALUE function

```sql
FROM UNNEST(JSON_EXTRACT_ARRAY(jsondata, "$.members")) AS member

FROM UNNEST(JSON_EXTRACT_ARRAY(member, '$.abilities')) AS ability
```

If we were to save our results to a table, the schema would look as follows:

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1679563208404/8fa06db1-76ce-4bca-b6f9-ff67af90d7f5.png align="left")

Under the right conditions — the absence of a schema drift in the source, volumes big enough to be worth the hassle, multiple nested attributes with arrays, and users trained to interact with nested data — this structure would be more efficient for storage and querying while also allowing for more discoverability of the data. One would not need to study the JSON schema anymore to understand the shape of the data, a simple look at the above schema would suffice.

Thanks for reading!

*Found it useful? Subscribe to my Analytics newsletter at* [*notjustsql.com*](https://www.notjustsql.com)*.*

---

*Enjoyed this? Here are some related articles you might find useful:*

- [Using ARRAY_AGG in BigQuery](https://datawise.dev/using-array-agg-in-bigquery)
- [UNNESTING ARRAYS in BigQuery](https://datawise.dev/unnesting-arrays-in-bigquery)
- [Leveraging ARRAYS in BigQuery for query performance](https://datawise.dev/leveraging-arrays-in-bigquery-for-query-performance)
- [Accessing ARRAY elements in BigQuery](https://datawise.dev/accessing-array-elements-in-bigquery)

