Building My First Production-Ready ELT Pipeline: A Student’s Journey with Docker, PostgreSQL, dbt, and Airflow

This content originally appeared on DEV Community and was authored by ashraf el houfi

How I built an end-to-end data pipeline from scratch using modern data engineering tools

Introduction

From Student to Data Engineer: My First Pipeline

As a student diving into the world of data engineering, I embarked on building my first complete ELT (Extract, Load, Transform) pipeline. This project taught me the fundamentals of modern data architecture and gave me hands-on experience with industry-standard tools.

What you'll learn from this article:

How to design and implement an ELT pipeline from scratch
Docker containerization for data services
Data transformation with dbt (data build tool)
Workflow orchestration with Apache Airflow
Real-world best practices and lessons learned

Tech Stack:
🐳 Docker & Docker Compose
🐘 PostgreSQL (Source & Destination)
🔧 dbt (Data Build Tool)
✈️ Apache Airflow
🐍 Python

Section 1: Architecture Overview

The Architecture

My pipeline follows the modern ELT pattern:

Extract & Load: Python script extracts data from source PostgreSQL and loads into destination
Transform: dbt handles data transformations, testing, and documentation
Orchestrate: Airflow manages the entire workflow

Why ELT over ETL?

Scalability: Transform after loading leverages destination database power
Flexibility: Raw data available for ad-hoc analysis
Modern Approach: Aligns with cloud data warehouse patterns

Section 2: Implementation Deep Dive

Building the Pipeline Step by Step

Step 1: Containerized Database Setup

# docker-compose.yaml excerpt
services:
  source_postgres:
    image: postgres:15
    ports:
      - "5433:5432"
    environment:
      POSTGRES_DB: source_db
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: secret
    volumes:
      - ./source_db_init/init.sql:/docker-entrypoint-initdb.d/init.sql

Step 2: The ELT Script

# elt_script.py - The heart of data movement
import subprocess
import sys

def wait_for_postgres(host, max_retries=5, delay_seconds=5):
    # Connection logic here
    pass

def extract_and_load():
    # pg_dump for extraction
    dump_command = [
        'pg_dump',
        '-h', 'source_postgres',
        '-U', 'postgres',
        '-d', 'source_db',
        '--clean',
        '--if-exists'
    ]

    # psql for loading
    load_command = [
        'psql',
        '-h', 'destination_postgres',
        '-U', 'postgres',
        '-d', 'destination_db'
    ]

Section 3: Data Transformation with dbt

Smart Transformations with dbt

Custom Macro for Rating Classification

-- macros/classify_ratings.sql
{% macro classify_ratings(rating_column) %}
  CASE
    WHEN {{ rating_column }} >= 4.5 THEN 'Excellent'
    WHEN {{ rating_column }} >= 4.0 THEN 'Good'
    WHEN {{ rating_column }} >= 3.0 THEN 'Average'
    WHEN {{ rating_column }} >= 2.0 THEN 'Poor'
    ELSE 'Very Poor'
  END
{% endmacro %}

Transformation Models

-- models/film_classification.sql
SELECT
    film_id,
    title,
    user_rating,
    {{ classify_ratings('user_rating') }} as rating_category,
    release_date
FROM {{ ref('films') }}

Why dbt?

Version Control: SQL transformations in Git
Testing: Built-in data quality tests
Documentation: Auto-generated lineage
Modularity: Reusable macros and models

Section 4: Orchestration with Airflow

Orchestrating with Apache Airflow

The DAG Structure

# airflow/dags/elt_pipeline.py
from airflow import DAG
from airflow.operators.bash import BashOperator

dag = DAG(
    'elt_pipeline',
    default_args=default_args,
    description='Extract, Load, and Transform pipeline using dbt',
    schedule=timedelta(hours=1),
    catchup=False,
    tags=['elt', 'postgres', 'dbt'],
)

# Task dependencies
elt_task >> dbt_task >> quality_check

Pipeline Visualization

Section 5: Results & Monitoring

Results & What I Learned

Pipeline Performance

Data Volume: 20 films, 20 actors, 39 categories, 14 users
Execution Time: ~30 seconds end-to-end
Success Rate: 100% after debugging
Tests Passed: 20/20 dbt data quality tests

Key Metrics Dashboard

Sample Output

-- Transformed data example
SELECT film_id, title, rating_category, actors 
FROM film_rating 
LIMIT 3;

| film_id | title      | rating_category | actors           |
|---------|------------|-----------------|------------------|
| 1       | Inception  | Excellent       | Leonardo DiCaprio|
| 2       | Shawshank  | Excellent       | Tim Robbins      |
| 3       | Godfather  | Excellent       | Marlon Brando    |

Section 6: Lessons Learned

Student Insights & Lessons Learned

What Went Well

Containerization: Docker made development environment consistent
Version Control: Everything in Git from day one
Incremental Development: Built piece by piece, tested at each step
Documentation: Commented code saved me hours of debugging

Challenges & Solutions

PostgreSQL Version Mismatch
- Problem: pg_dump version didn't match server
- Solution: Standardized on PostgreSQL 15 images
Airflow 3.0.3 Breaking Changes
- Problem: schedule_interval parameter deprecated
- Solution: Updated to schedule parameter
dbt Schema Validation
- Problem: Column names in tests didn't match actual model
- Solution: Regular testing and validation

If I Started Over

Use Infrastructure as Code (Terraform) for cloud deployment
Implement CI/CD pipeline with GitHub Actions
Add data lineage tracking
Include more comprehensive logging

Section 7: Next Steps & Future Improvements

Short-term Improvements

Add data quality alerts
Implement incremental loading
Create Slack notifications for failures
Add more sophisticated dbt tests

Long-term Goals

Deploy to AWS/GCP with managed services
Implement streaming with Kafka
Add ML pipeline integration
Scale to handle GB+ datasets

For Fellow Students

If you're starting your data engineering journey:

Start Small: Begin with simple transformations
Practice Regularly: Build something every week
Join Communities: dbt Slack, Airflow forums
Document Everything: Your future self will thank you
Share Your Work: Teaching others reinforces learning

Conclusion

Building this ELT pipeline taught me that data engineering is equal parts technical skill and problem-solving mindset. Every error message was a learning opportunity, and every successful run was a small victory.

Key Takeaways:

Modern data tools are powerful but require careful integration
Container orchestration simplifies complex deployments
Data quality testing is non-negotiable
Good documentation saves more time than you think

Resources That Helped Me

dbt Learn
Airflow Documentation
Docker Compose Guide
Data Engineering communities on Discord/Slack

Want to try this yourself?
Check out [https://github.com/el-houfi-achraf/elt-pipeline] with full source code and setup instructions.

This content originally appeared on DEV Community and was authored by ashraf el houfi

Print Share Comment Cite Upload Translate Updates

APA

ashraf el houfi | Sciencx (2025-08-07T22:34:45+00:00) Building My First Production-Ready ELT Pipeline: A Student’s Journey with Docker, PostgreSQL, dbt, and Airflow. Retrieved from https://www.scien.cx/2025/08/07/building-my-first-production-ready-elt-pipeline-a-students-journey-with-docker-postgresql-dbt-and-airflow/

MLA

" » Building My First Production-Ready ELT Pipeline: A Student’s Journey with Docker, PostgreSQL, dbt, and Airflow." ashraf el houfi | Sciencx - Thursday August 7, 2025, https://www.scien.cx/2025/08/07/building-my-first-production-ready-elt-pipeline-a-students-journey-with-docker-postgresql-dbt-and-airflow/

HARVARD

ashraf el houfi | Sciencx Thursday August 7, 2025 » Building My First Production-Ready ELT Pipeline: A Student’s Journey with Docker, PostgreSQL, dbt, and Airflow., viewed ,<https://www.scien.cx/2025/08/07/building-my-first-production-ready-elt-pipeline-a-students-journey-with-docker-postgresql-dbt-and-airflow/>

VANCOUVER

ashraf el houfi | Sciencx - » Building My First Production-Ready ELT Pipeline: A Student’s Journey with Docker, PostgreSQL, dbt, and Airflow. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/08/07/building-my-first-production-ready-elt-pipeline-a-students-journey-with-docker-postgresql-dbt-and-airflow/

CHICAGO

" » Building My First Production-Ready ELT Pipeline: A Student’s Journey with Docker, PostgreSQL, dbt, and Airflow." ashraf el houfi | Sciencx - Accessed . https://www.scien.cx/2025/08/07/building-my-first-production-ready-elt-pipeline-a-students-journey-with-docker-postgresql-dbt-and-airflow/

IEEE

" » Building My First Production-Ready ELT Pipeline: A Student’s Journey with Docker, PostgreSQL, dbt, and Airflow." ashraf el houfi | Sciencx [Online]. Available: https://www.scien.cx/2025/08/07/building-my-first-production-ready-elt-pipeline-a-students-journey-with-docker-postgresql-dbt-and-airflow/. [Accessed: ]

rf:citation

» Building My First Production-Ready ELT Pipeline: A Student’s Journey with Docker, PostgreSQL, dbt, and Airflow | ashraf el houfi | Sciencx | https://www.scien.cx/2025/08/07/building-my-first-production-ready-elt-pipeline-a-students-journey-with-docker-postgresql-dbt-and-airflow/ |

Please log in to upload a file.

There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.