# Table Sampling in BigQuery

It should be no surprise that understanding your data is very important when working with it. The initial step in my approach to a new dataset always involves examining the data closely. Although the `Schema Tab` reveals the data schema, to grasp the contents, identify missing data, spot potential issues, and review the cardinality of values, taking a look at the data is unavoidable.

![Schema tab for a table in BigQuery](https://cdn.hashnode.com/res/hashnode/image/upload/v1698321245537/830569d7-1cea-48d3-80ef-11be6b18da90.png align="center")

Then comes the issue of cost.

In BigQuery, when working with big tables, you should know that a `LIMIT` clause does not reduce the amount of data processed (and thus the cost), but merely truncates the output.

There are a couple of ways around that.

### Preview

The simplest is the `PREVIEW` button, which allows us to see a subset of rows from this particular table. This does not incur any charges.

![Preview Tab](https://cdn.hashnode.com/res/hashnode/image/upload/v1698321409850/d6f2e392-c563-44c4-9959-06c38f34c4a4.png align="center")

### Partitioning

Another way would be leveraging partitions in a partitioned table. By selecting one particular partition in such a table we will achieve partition elimination - BigQuery will ignore all other partitions (dates) and process only the one we are providing, achieving a cost saving for us.

![Reading one partition from a partitioned table](https://cdn.hashnode.com/res/hashnode/image/upload/v1698321631027/606e3bf0-da8b-4b9e-bc85-6ee4edccf6c4.png align="center")

### Sampling

A relatively new way would be using [Table Sampling](https://cloud.google.com/bigquery/docs/table-sampling). While this is still in Pre-GA (so not fit for production yet), it is still handy when doing exploratory work with data.

How does it work? Using the TABLESAMPLE SYSTEM command, you provide a percentage of rows that you'd like sampled and returned to you.

The SQL command would look like as follows:

```sql
SELECT * 
FROM `learning.data_source` 
TABLESAMPLE SYSTEM (1 PERCENT)
```

Using this, from my table of ~400k rows the query returned 4k rows.

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1698322093613/b548bfe7-7b73-4836-bbb2-3233627ec0b2.png align="center")

![](https://cdn.hashnode.com/res/hashnode/image/upload/v1698322059672/e012dab5-8bd5-443a-9e91-0310d3a78304.png align="center")

To summarize, employing these three strategies — Preview, Partition Filtering, and Sampling — together, significantly saves time, computational resources, and consequently, money during the exploratory stage, especially when handling big tables.

*Found it useful? Subscribe to my Analytics newsletter at* [*notjustsql.com*](https://www.notjustsql.com)*.*
