A portable data stack with Dagster, Docker, DuckDB, dbt and Superset

A portable data stack with Dagster, Docker, DuckDB, dbt and Superset

ยท

2 min read

I've previously mentioned how pet projects are good for exploring new technologies. It's not every day that you can work on a greenfield project with just the stack you want.

So while ago I decided to spin off a previous proof of concept I've had (portable data stack with Airflow) and create one just like that, but better. I was also curious to try out a new orchestrator - Dagster and DuckDB - an in-process OLAP DMBS.

Scenario:
Imagine a company selling postcards of European cities:
- Their main system? A #Postgres OLTP for direct sales & customer data.
- They collaborate with resellers, obtaining indirect sales data via #JSON & #CSV.
- The need? A Data Warehouse to fuel their analytical insights and provide dashboards.

Objective: Craft a completely portable system with every component containerized. The aim? Minimalism yet realistic functionality.

The Build:
- #Python scripts churn out sample data.
- #Dbt Core for model building.
- #Dagster for orchestration (bonus: used #Polars backend).
- #DuckDB as our OLAP database for the Data Warehouse.
- #Superset for visualization, aiding the data analyst.
- #Docker and Docker-compose for containerization

Takeaways:
๐ŸŒŸ DuckDB: An OLAP gem! Think of it as Sqliteโ€™s OLAP counterpart: versatile, user-friendly, and a powerhouse for these applications.
๐Ÿ“˜ Dagster: A joy to navigate. Stellar documentation, impressive dbt integration, and the concept of the software-defined asset? A game-changer.
๐Ÿ“Š Superset + DuckDB: Craft SQL queries, visualize, repeat. So smooth!

Real-world Utility:
- A playground to explore these technologies.
- A striking demo.
- A starting point for someone still doing (only) Excel analytics. It might need some more love, but I'll refer to it as a starting point in the future.

Eager to dive in? See the Github repository.

Your insights and feedback are goldenโ€”do share! โœจ

Apache Superset

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.

Did you find this article valuable?

Support Constantin Lungu by becoming a sponsor. Any amount is appreciated!

ย