This content originally appeared on DEV Community and was authored by Favour Onyeneke
NBA Statistics Pipeline🏀
🚀 Introduction
This project is an NBA Statistics Pipeline that fetches NBA team statistics from the SportsData API and stores them in AWS DynamoDB. The project also implements structured logging using AWS CloudWatch, enabling efficient monitoring and debugging.
This project was built to demonstrate my proficiency in AWS services, Python, API integrations, and Infrastructure as Code (IaC).
🛠 Tech Stack
- Python (Data processing, API requests, logging)
- AWS DynamoDB (NoSQL database for storing NBA stats)
- AWS CloudWatch (Logging & monitoring)
- Boto3 (AWS SDK for Python)
- Docker (Containerization)
- EC2 Instance (Compute environment for development)
🎯 Features
- Fetches real-time NBA statistics from the SportsData API
- Stores team stats in AWS DynamoDB
- Structured logging with AWS CloudWatch
- Error handling and logging with JSON structured logs
- Uses environment variables for sensitive credentials
- Implements batch writing for efficiency
📸 Snapshots
- API Response Sample
- DynamoDB Table Data
- CloudWatch Logs (Structured logs for monitoring)
- Terminal Output (Successful execution of the pipeline)
🏗 Project Architecture
└── nba-stats-pipeline
├── src
│ ├── __init__.py
│ ├── nba_stats.py
│ ├── lambdafunction.py
├── requirements.txt # Dependencies
│ ├── .env # Environment variables
│ ├── Dockerfile # Containerization setup (if applicable)
├── README.md # Project documentation
🚀 Step-by-Step Guide to Building the NBA Stats Pipeline
4️⃣ Launch EC2 Instance and SSH Into It
ssh -i "nba-stats-pipeline.pem" ubuntu@ec2-18-212-173-76.compute-1.amazonaws.com
1️⃣ Clone the Repository
git clone https://github.com/onlyfave/nba-stats-pipeline.git
cd nba-stats-pipeline
1️⃣ Install Python3
Python3 is required to run the project.
sudo apt update
sudo apt install python3
1️⃣ Install Pip
On most systems, pip comes pre-installed with Python3. To verify, run:
pip3 --version
If you don't have pip installed, use the following command:
sudo apt install python3-pip
2️⃣ Install Dependencies
pip install -r requirements.txt
3️⃣ Set Up Environment Variables
Create a .env
file with the following content:
SPORTDATA_API_KEY=your_api_key
DYNAMODB_TABLE_NAME=nba-player-stats
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
4️⃣ CD Into the Folder Containing the Pipeline
cd src
4️⃣ Run the Pipeline
python3 nba_stats.py
📊 Sample Data Format
[
{
"TeamID": 1,
"TeamName": "Los Angeles Lakers",
"Wins": 25,
"Losses": 15,
"PointsPerGameFor": 112.5,
"PointsPerGameAgainst": 108.3
}
]
🏗 Deployment (Optional: Dockerized Version)
To run this project inside a Docker container:
docker build -t nba-stats-pipeline .
docker run --env-file .env nba-stats-pipeline
🔥 Key Takeaways
- AWS Expertise: Used DynamoDB & CloudWatch for data storage & monitoring
- DevOps Skills: Managed credentials, logging, and error handling efficiently
- Cloud-Native Thinking: Designed a cloud-based ETL pipeline
📌 Next Steps
- Implement Lambda Functions for automated execution
- Deploy using AWS ECS or Kubernetes
- Integrate with Grafana for real-time data visualization
📢 Connect With Me
This content originally appeared on DEV Community and was authored by Favour Onyeneke

Favour Onyeneke | Sciencx (2025-01-31T12:42:00+00:00) Building a Scalable Real-Time NBA Stats Pipeline With AWS: Unlocking Seamless Data Integration. Retrieved from https://www.scien.cx/2025/01/31/building-a-scalable-real-time-nba-stats-pipeline-with-aws-unlocking-seamless-data-integration/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.