A portable data stack with Dagster, Docker, DuckDB, dbt and Superset

Senior Data Engineer • Contractor / Freelancer • GCP & AWS Certified
I've previously mentioned how pet projects are good for exploring new technologies. It's not every day that you can work on a greenfield project with just the stack you want.
So while ago I decided to spin off a previous proof of concept I've had (portable data stack with Airflow) and create one just like that, but better. I was also curious to try out a new orchestrator - Dagster and DuckDB - an in-process OLAP DMBS.
Scenario:
Imagine a company selling postcards of European cities:
- Their main system? A #Postgres OLTP for direct sales & customer data.
- They collaborate with resellers, obtaining indirect sales data via #JSON & #CSV.
- The need? A Data Warehouse to fuel their analytical insights and provide dashboards.
Objective: Craft a completely portable system with every component containerized. The aim? Minimalism yet realistic functionality.
The Build:
- #Python scripts churn out sample data.
- #Dbt Core for model building.
- #Dagster for orchestration (bonus: used #Polars backend).
- #DuckDB as our OLAP database for the Data Warehouse.
- #Superset for visualization, aiding the data analyst.
- #Docker and Docker-compose for containerization
Takeaways:
🌟 DuckDB: An OLAP gem! Think of it as Sqlite’s OLAP counterpart: versatile, user-friendly, and a powerhouse for these applications.
📘 Dagster: A joy to navigate. Stellar documentation, impressive dbt integration, and the concept of the software-defined asset? A game-changer.
📊 Superset + DuckDB: Craft SQL queries, visualize, repeat. So smooth!
Real-world Utility:
- A playground to explore these technologies.
- A striking demo.
- A starting point for someone still doing (only) Excel analytics. It might need some more love, but I'll refer to it as a starting point in the future.
Eager to dive in? See the Github repository.
Your insights and feedback are golden—do share! ✨

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.





