Skip to main content

Command Palette

Search for a command to run...

Table grain quick validation with SQL

Updated
1 min read
Table grain quick validation with SQL

I was doing some exploratory data analysis on a number of tables I didn’t have much information about and, unfortunately, didn’t know their grain.

I needed a quick way to validate my assumptions about the table grain, identify contradicting observations (rows), and check for duplicates at the same time.

Therefore I decided to use a combination of TO_JSON_STRING and FARM_FINGERPRINT. The first creates a JSON representation of the entire row (given a table alias), while the second converts the resulting string into a INT64 hash.

By comparing the total number of rows in a group against the distinct count of these fingerprints, we can determine whether the proposed grain is correct and whether there are duplicates in the data.

This was a quick exercise but use this with care. Depending on your SQL implementation, data volumes and context, results may vary.

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.


Enjoyed this? Here are some related articles you might find useful: