Why how the data is collected matters
I find that working with data daily significantly influences how you perceive the world around you. You begin to notice patterns and details you might otherwise overlook.
Take retail, for example.
In our places, it's common for pharmacies, gas stations, and supermarket chains to offer loyalty cards. Some of these programs are worthwhile, while others are not. Personally, I'm wary of them for privacy reasons.
From an outside viewer like me, their goal lies on the surface: boosting customer loyalty and collecting user data. But why is this data collected? Spending habits, products frequently bought together, evolution of demand patterns - for retailers, this information is a gold mine.
But what's even more interesting is the data collection process. In some places, clerks will scan their own loyalty cards for customers who don't have one, accumulating points for themselves. That's a huge outlier raking tens and hundreds of transactions in a day. Filter that one out.π
Some people even have multiple loyalty cards for the same store, especially when no form is required to get oneβthey just hand you a new card. How do you accurately identify a unique buyer in this case?
In other cases, an entire family might use the same loyalty card. Just imagine the variety of consumer patterns for that one.
The data teams working with such information must have fascinating day-to-day tasks, involving elaborate data cleaning and processing setups.
Whenever I see the events being collected in real life, I'm already thinking about someone analyzing them with SQL somewhere. π
So, as a Data Analyst, Data Engineer, or Data Scientist, it might be beneficial to consider how the data was collected the next time you work on a data task.
Found it useful? Subscribe to my Analytics newsletter atnotjustsql.com.