I've previously mentioned how pet projects are good for exploring new technologies. It's not every day that you can work on a greenfield project with just the stack you want.
So while ago I decided to spin off a previous proof of concept I've had (portable data stack with Airflow) and create one just like that, but better. I was also curious to try out a new orchestrator - Dagster and DuckDB - an in-process OLAP DMBS.
Scenario:
Imagine a company selling postcards of European cities:
- Their main system? A #Postgres OLTP for direct sales & customer data.
- They collaborate with resellers, obtaining indirect sales data via #JSON & #CSV.
- The need? A Data Warehouse to fuel their analytical insights and provide dashboards.
Objective: Craft a completely portable system with every component containerized. The aim? Minimalism yet realistic functionality.
The Build:
- #Python scripts churn out sample data.
- #Dbt Core for model building.
- #Dagster for orchestration (bonus: used #Polars backend).
- #DuckDB as our OLAP database for the Data Warehouse.
- #Superset for visualization, aiding the data analyst.
- #Docker and Docker-compose for containerization
Takeaways:
๐ DuckDB: An OLAP gem! Think of it as Sqliteโs OLAP counterpart: versatile, user-friendly, and a powerhouse for these applications.
๐ Dagster: A joy to navigate. Stellar documentation, impressive dbt integration, and the concept of the software-defined asset? A game-changer.
๐ Superset + DuckDB: Craft SQL queries, visualize, repeat. So smooth!
Real-world Utility:
- A playground to explore these technologies.
- A striking demo.
- A starting point for someone still doing (only) Excel analytics. It might need some more love, but I'll refer to it as a starting point in the future.
Eager to dive in? See the Github repository.
Your insights and feedback are goldenโdo share! โจ
Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.