Skip to main content

Command Palette

Search for a command to run...

A portable data stack with Dagster, Docker, DuckDB, dbt and Superset

Updated
2 min read
A portable data stack with Dagster, Docker, DuckDB, dbt and Superset
C

Senior Data Engineer • Contractor / Freelancer • GCP & AWS Certified

I've previously mentioned how pet projects are good for exploring new technologies. It's not every day that you can work on a greenfield project with just the stack you want.

So while ago I decided to spin off a previous proof of concept I've had (portable data stack with Airflow) and create one just like that, but better. I was also curious to try out a new orchestrator - Dagster and DuckDB - an in-process OLAP DMBS.

Scenario:
Imagine a company selling postcards of European cities:
- Their main system? A #Postgres OLTP for direct sales & customer data.
- They collaborate with resellers, obtaining indirect sales data via #JSON & #CSV.
- The need? A Data Warehouse to fuel their analytical insights and provide dashboards.

Objective: Craft a completely portable system with every component containerized. The aim? Minimalism yet realistic functionality.

The Build:
- #Python scripts churn out sample data.
- #Dbt Core for model building.
- #Dagster for orchestration (bonus: used #Polars backend).
- #DuckDB as our OLAP database for the Data Warehouse.
- #Superset for visualization, aiding the data analyst.
- #Docker and Docker-compose for containerization

Takeaways:
🌟 DuckDB: An OLAP gem! Think of it as Sqlite’s OLAP counterpart: versatile, user-friendly, and a powerhouse for these applications.
📘 Dagster: A joy to navigate. Stellar documentation, impressive dbt integration, and the concept of the software-defined asset? A game-changer.
📊 Superset + DuckDB: Craft SQL queries, visualize, repeat. So smooth!

Real-world Utility:
- A playground to explore these technologies.
- A striking demo.
- A starting point for someone still doing (only) Excel analytics. It might need some more love, but I'll refer to it as a starting point in the future.

Eager to dive in? See the Github repository.

Your insights and feedback are golden—do share! ✨

Apache Superset

Found it useful? Subscribe to my Analytics newsletter at notjustsql.com.

More from this blog

D

Datawise — SQL, BigQuery & Python for Data Engineers

206 posts

Data Engineer with a passion for transforming complex data landscapes into insightful stories. Here on my blog, I share insights, challenges, and the ever-evolving dance of technology and business.