This content originally appeared on DEV Community and was authored by Utkarsh Rastogi
Hey everyone! Today I'm diving deep into ECS services and auto-scaling. After setting up the load balancer on Day 4, it's time to deploy my FastAPI application with intelligent scaling that responds to real traffic patterns.
What We're Building Today
- ECS Service that keeps containers running and healthy
- Smart Auto-Scaling that maintains optimal performance (1-5 containers)
- FastAPI Application with multiple endpoints for testing
- CloudWatch Monitoring with email alerts
- Load Testing Endpoints to validate scaling behavior
The Complete ECS Service with Auto-Scaling
Note: We already created the ECS cluster in our previous setup, so we'll focus on the service configuration.
Here's the full ECS service configuration with intelligent auto-scaling:
# infra/ecs/ecs_service.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Creating ECS Service for Learning Purpose'
Parameters:
TeamNameValue:
Type: String
Description: TeamName Tag Value
Default: "awslearner"
EnvironmentValue:
Type: String
Description: Environment Tag Value
Default: "dev"
ServiceName:
Type: String
Description: Name of the ECS service
Default: "learner-svc"
ExecutionRoleName:
Type: String
Description: Name of the service execution role
Default: "learner-ecs-role"
TaskRoleName:
Type: String
Description: Name of the service execution role
Default: "learner-ecs-task-exc-role"
ImageARN:
Type: AWS::SSM::Parameter::Value<String>
Description: Image ARN
Default: "/learner/imagearn/value"
ECSCluster:
Type: String
Description: ECS Cluster Name
Default: "learner-cluster"
PublicSubnetIds:
Type: AWS::SSM::Parameter::Value<String>
Description: Subnet ID
Default: "/learner/public/subnetids"
SecurityGroup:
Type: AWS::SSM::Parameter::Value<String>
Description: Security Group
Default: "/learner/public/sgid"
TargetGroupArn:
Type: AWS::SSM::Parameter::Value<String>
Description: Target Group ARN
Default: "/learner/target/value"
AlertEmail:
Type: String
Description: Email address for alerts
Default: <Provide Email Address>
Resources:
# Storage for your container logs
ECSLogGroup:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: !Sub /ecs/${ServiceName}-logs
RetentionInDays: 14
Tags:
- Key: Name
Value: !Sub /ecs/${ServiceName}-logs
- Key: TeamName
Value: !Ref TeamNameValue
- Key: Environment
Value: !Ref EnvironmentValue
# Blueprint that tells ECS how to run your containers
TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: !Sub ${ServiceName}-task
Cpu: 256
Memory: 512
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
ExecutionRoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/${ExecutionRoleName}
TaskRoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/${TaskRoleName}
ContainerDefinitions:
- Name: !Sub ${ServiceName}-container
Image: !Ref ImageARN
PortMappings:
- ContainerPort: 80
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref ECSLogGroup
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: ecs
Tags:
- Key: Name
Value: !Sub ${ServiceName}-task
- Key: TeamName
Value: !Ref TeamNameValue
- Key: Environment
Value: !Ref EnvironmentValue
# Service that keeps your containers running and healthy
ECSService:
Type: AWS::ECS::Service
Properties:
Cluster: !Ref ECSCluster
ServiceName: !Sub ${ServiceName}-service
TaskDefinition: !Ref TaskDefinition
LaunchType: FARGATE
DesiredCount: 1
PropagateTags: SERVICE
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: ENABLED
Subnets: !Split
- ","
- !Ref PublicSubnetIds
SecurityGroups:
- !Ref SecurityGroup
LoadBalancers:
- ContainerName: !Sub ${ServiceName}-container
ContainerPort: 80
TargetGroupArn: !Ref TargetGroupArn
Tags:
- Key: Name
Value: !Sub ${ServiceName}-service
- Key: TeamName
Value: !Ref TeamNameValue
- Key: Environment
Value: !Ref EnvironmentValue
# Defines scaling limits for your containers (1-5 tasks)
AutoScalingTarget:
Type: AWS::ApplicationAutoScaling::ScalableTarget
DependsOn: ECSService
Properties:
ServiceNamespace: ecs
ResourceId: !Sub "service/${ECSCluster}/${ServiceName}-service"
ScalableDimension: ecs:service:DesiredCount
MinCapacity: 1
MaxCapacity: 5
# Target tracking scaling policy - maintains CPU around 40%
AutoScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: !Sub ${EnvironmentValue}-${ServiceName}-target-tracking
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref AutoScalingTarget
TargetTrackingScalingPolicyConfiguration:
TargetValue: 40.0 # Target 40% CPU utilization
PredefinedMetricSpecification:
PredefinedMetricType: ECSServiceAverageCPUUtilization
ScaleOutCooldown: 120 # Wait 2 minutes before scaling up
ScaleInCooldown: 300 # Wait 5 minutes before scaling down
DisableScaleIn: false
# Email notification system for alerts
AlertTopic:
Type: AWS::SNS::Topic
Properties:
TopicName: !Sub ${EnvironmentValue}-${ServiceName}-alerts
DisplayName: !Sub "${ServiceName} ECS Alerts"
Tags:
- Key: Name
Value: !Sub ${EnvironmentValue}-${ServiceName}-alerts
- Key: TeamName
Value: !Ref TeamNameValue
- Key: Environment
Value: !Ref EnvironmentValue
# Connects your email to the alert system
AlertSubscription:
Type: AWS::SNS::Subscription
Properties:
Protocol: email
TopicArn: !Ref AlertTopic
Endpoint: !Ref AlertEmail
# Alert when CPU is high at maximum capacity
CriticalCPUAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: !Sub ${EnvironmentValue}-${ServiceName}-CriticalCPU-AtMaxCapacity
AlarmDescription: !Sub "CRITICAL: ${ServiceName} CPU >70% at max capacity (5 tasks)"
MetricName: CPUUtilization
Namespace: AWS/ECS
Statistic: Average
Period: 60
EvaluationPeriods: 2
Threshold: 70
ComparisonOperator: GreaterThanThreshold
AlarmActions:
- !Ref AlertTopic
Dimensions:
- Name: ServiceName
Value: !Sub ${ServiceName}-service
- Name: ClusterName
Value: !Ref ECSCluster
# Alert when you reach maximum number of containers
MaxCapacityAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: !Sub ${EnvironmentValue}-${ServiceName}-MaxCapacity-Reached
AlarmDescription: !Sub "WARNING: ${ServiceName} reached maximum capacity (5 tasks)"
MetricName: RunningTaskCount
Namespace: AWS/ECS
Statistic: Maximum
Period: 60
EvaluationPeriods: 1
Threshold: 5
ComparisonOperator: GreaterThanOrEqualToThreshold
AlarmActions:
- !Ref AlertTopic
Dimensions:
- Name: ServiceName
Value: !Sub ${ServiceName}-service
- Name: ClusterName
Value: !Ref ECSCluster
FastAPI Application with Testing Endpoints
Here's my FastAPI application with endpoints designed to test different scenarios:
# source/app.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import logging
import time
import hashlib
import boto3
import threading
import multiprocessing
app = FastAPI()
logger = logging.getLogger("ecs_service")
active_requests = 0
# Initialize ECS client for monitoring
try:
ecs_client = boto3.client('ecs')
except Exception as e:
logger.warning(f"Could not initialize ECS client: {e}")
ecs_client = None
class SubmitData(BaseModel):
name: str = "User"
@app.get("/")
def home():
"""Simple welcome message"""
return "Hello from ECS Fargate Service!"
@app.get("/api/health")
def health():
"""Health check endpoint for load balancer"""
return {"status": "healthy"}
@app.post("/api/submit")
def api_submit(data: SubmitData):
"""Accepts user data and returns personalized message"""
logger.info(f"Data received: {data.model_dump()}")
return {"message": f"Happy learning, {data.name}!", "data": data.model_dump()}
@app.get("/api/load")
def generate_load():
"""Generates heavy CPU load for 60 seconds to test auto-scaling"""
def cpu_intensive_task():
start_time = time.time()
while time.time() - start_time < 60:
for _ in range(10000):
hashlib.sha256(str(time.time()).encode()).hexdigest()
sum(range(1000))
# Use multiple threads to maximize CPU usage
cpu_count = multiprocessing.cpu_count()
threads = []
for _ in range(cpu_count * 2):
thread = threading.Thread(target=cpu_intensive_task)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
logger.info("CPU load generation completed")
return {"status": "load_generated", "duration": "60s", "threads": cpu_count * 2}
@app.get("/api/quickload")
def quick_load():
"""Generates 10-second CPU burst for quick scaling tests"""
def burst_task():
start_time = time.time()
while time.time() - start_time < 10:
for _ in range(50000):
hashlib.sha256(str(time.time()).encode()).hexdigest()
cpu_count = multiprocessing.cpu_count()
threads = []
for _ in range(cpu_count * 3):
thread = threading.Thread(target=burst_task)
thread.start()
threads.append(thread)
for thread in threads:
thread.join()
return {"status": "burst_completed", "duration": "10s"}
@app.get("/api/scalinginfo")
def get_scaling_info():
"""Returns current ECS service scaling status"""
if not ecs_client:
return {"error": "ECS client not available"}
try:
response = ecs_client.describe_services(
cluster='learner-cluster',
services=['learner-svc-service']
)
if response['services']:
service = response['services'][0]
return {
"cluster": "learner-cluster",
"service": "learner-svc-service",
"desired_count": service['desiredCount'],
"running_count": service['runningCount'],
"pending_count": service['pendingCount'],
"status": service['status'],
"active_requests": active_requests
}
else:
return {"error": "Service not found"}
except Exception as e:
logger.error(f"Error getting scaling info: {str(e)}")
return {"error": "Unable to fetch scaling information"}
@app.get("/api/error")
def trigger_error():
"""Triggers 500 error for testing error handling"""
logger.error("Intentional error triggered")
raise HTTPException(status_code=500, detail="Internal server error")
@app.get("/api/notfound")
def not_found():
"""Triggers 404 error for testing not found responses"""
logger.warning("Resource not found")
raise HTTPException(status_code=404, detail="Resource not found")
Requirements File
# source/requirements.txt
fastapi==0.104.1
uvicorn==0.24.0
boto3==1.34.0
pydantic==2.5.0
Deployment Commands
Important: Before deploying the ECS service, we need to build and push our container image using the CodeBuild project we set up in Day 4.
Deploy in this order:
# 1. First, build and push your container image
# This uses the CodeBuild project from Day 4
aws codebuild start-build --project-name learner-project
# Wait for the build to complete (check in AWS Console or CLI)
# This typically takes 2-3 minutes
# 2. Deploy ECS service with auto-scaling
aws cloudformation deploy \
--template-file infra/ecs/ecs_service.yaml \
--stack-name AWSLearner-ECS-Stack \
--capabilities CAPABILITY_NAMED_IAM
Auto-Scaling Scenario's
1. Check Current Status
curl http://your-alb-url/api/scalinginfo
Response:
{
"desired_count": 1,
"running_count": 1,
"pending_count": 0,
"status": "ACTIVE"
}
2. Trigger Heavy Load
curl http://your-alb-url/api/load
This creates 60 seconds of intense CPU load. Watch CloudWatch metrics to see:
- CPU utilization spike to 80-90%
- Auto-scaling trigger after 2 minutes
- New containers start (desired_count increases)
- Load distributes across containers
- CPU drops back to target 40%
3. Quick Burst Test
curl http://your-alb-url/api/quickload
Perfect for testing rapid scaling response with a 10-second burst.
Postman Testing
- GET /api/health - Health check status
- POST /api/submit - Data submission with JSON body
- GET /api/scalinginfo - Current container status
- GET /api/load - Load generation response
- GET /api/quickload - Quick burst response
- GET /api/error - Error handling test
- GET /api/notfound - 404 error test
Note: To see auto-scaling in action, you'll need to hit the load endpoints multiple times or use multiple browser tabs/terminals simultaneously to generate enough traffic that triggers the CPU threshold.
What Each Endpoint Does
Endpoint | Purpose | Response Time |
---|---|---|
/api/health |
Load balancer health check | Instant |
/api/submit |
Data processing test | Instant |
/api/load |
Sustained CPU load (60s) | 60 seconds |
/api/quickload |
CPU burst (10s) | 10 seconds |
/api/scalinginfo |
Current container status | Instant |
/api/error |
Error handling test | Instant |
/api/notfound |
404 error test | Instant |
Auto-Scaling Behavior
How it works:
- Target: Maintains 40% CPU utilization
- Scale Out: Adds containers when CPU > 40% (2-minute cooldown)
- Scale In: Removes containers when CPU < 40% (5-minute cooldown)
- Limits: 1-5 containers
- Alerts: Email notifications at critical thresholds
Scaling Timeline:
- 0-2 minutes: High CPU detected, evaluation period
- 2-4 minutes: New container launching
- 4-6 minutes: Container healthy, receiving traffic
- 6+ minutes: Load distributed, CPU normalizes
Key Learnings
- Target tracking scaling is much smarter than threshold-based scaling
- Cooldown periods prevent rapid scaling that could cause instability
- Email alerts provide peace of mind without constant monitoring
- Load testing endpoints are essential for validating your setup
- ECS Fargate eliminates server management completely
What's Next in This Series?
In this comprehensive series, we've learned how to deploy a complete containerized application from VPC to ECS service using Fargate. We covered:
- VPC Setup with multi-AZ networking
- Security Groups and IAM roles
- ECR Repository for container images
- CodeBuild Pipeline for CI/CD
- Application Load Balancer for traffic distribution
- ECS Service with intelligent auto-scaling
The auto-scaling system feels really robust now. I can throw traffic at it, watch it scale intelligently, and get notified if anything needs attention. Perfect foundation for a production workload!
Complete Day 5 Learning Summary
What we accomplished in Day 5:
Infrastructure Built
- ECS Service with Fargate launch type (256 CPU, 512 MB RAM)
- Task Definition with proper IAM roles and logging
- Auto-Scaling Target (1-5 containers) with target tracking policy
- CloudWatch Alarms for critical CPU and max capacity alerts
- SNS Email Notifications for real-time monitoring
Application Features
- 7 FastAPI Endpoints for comprehensive testing
- Load Testing Capabilities (60s sustained + 10s burst)
- Real-time Monitoring with ECS service status
- Error Handling and health checks
- Structured Logging to CloudWatch
Auto-Scaling Intelligence
- Target Tracking: Maintains 40% CPU utilization
- Smart Cooldowns: 2min scale-out, 5min scale-in
- Proportional Scaling: Responds to load intensity
- Email Alerts: Critical thresholds and capacity warnings
Key Takeaway: We now have a fully automated, scalable containerized application that can handle real-world traffic patterns while maintaining cost efficiency and operational visibility.
💻 About Me
Hi! I'm Utkarsh, a Cloud Specialist & AWS Community Builder who loves turning complex AWS topics into fun chai-time stories ☕
This content originally appeared on DEV Community and was authored by Utkarsh Rastogi

Utkarsh Rastogi | Sciencx (2025-08-23T13:13:56+00:00) AWS VPC to ECS – Day 5: ECS Service with Smart Auto-Scaling. Retrieved from https://www.scien.cx/2025/08/23/aws-vpc-to-ecs-day-5-ecs-service-with-smart-auto-scaling/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.