AWS VPC to ECS – Day 5: ECS Service with Smart Auto-Scaling

Hey everyone! Today I’m diving deep into ECS services and auto-scaling. After setting up the load balancer on Day 4, it’s time to deploy my FastAPI application with intelligent scaling that responds to real traffic patterns.

What We’re Buildi…


This content originally appeared on DEV Community and was authored by Utkarsh Rastogi

Hey everyone! Today I'm diving deep into ECS services and auto-scaling. After setting up the load balancer on Day 4, it's time to deploy my FastAPI application with intelligent scaling that responds to real traffic patterns.

What We're Building Today

  • ECS Service that keeps containers running and healthy
  • Smart Auto-Scaling that maintains optimal performance (1-5 containers)
  • FastAPI Application with multiple endpoints for testing
  • CloudWatch Monitoring with email alerts
  • Load Testing Endpoints to validate scaling behavior

The Complete ECS Service with Auto-Scaling

Note: We already created the ECS cluster in our previous setup, so we'll focus on the service configuration.

Here's the full ECS service configuration with intelligent auto-scaling:

# infra/ecs/ecs_service.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Creating ECS Service for Learning Purpose'

Parameters:
  TeamNameValue:
    Type: String
    Description: TeamName Tag Value
    Default: "awslearner"
  EnvironmentValue:
    Type: String
    Description: Environment Tag Value
    Default: "dev"
  ServiceName:
    Type: String
    Description: Name of the ECS service
    Default: "learner-svc"
  ExecutionRoleName:
    Type: String
    Description: Name of the service execution role
    Default: "learner-ecs-role"
  TaskRoleName:
    Type: String
    Description: Name of the service execution role
    Default: "learner-ecs-task-exc-role"
  ImageARN:
    Type: AWS::SSM::Parameter::Value<String>
    Description: Image ARN
    Default: "/learner/imagearn/value"
  ECSCluster:
    Type: String
    Description: ECS Cluster Name
    Default: "learner-cluster"
  PublicSubnetIds:
    Type: AWS::SSM::Parameter::Value<String>
    Description: Subnet ID
    Default: "/learner/public/subnetids"
  SecurityGroup:
    Type: AWS::SSM::Parameter::Value<String>
    Description: Security Group
    Default: "/learner/public/sgid"
  TargetGroupArn:
    Type: AWS::SSM::Parameter::Value<String>
    Description: Target Group ARN
    Default: "/learner/target/value"
  AlertEmail:
    Type: String
    Description: Email address for alerts
    Default: <Provide Email Address>

Resources:
  # Storage for your container logs
  ECSLogGroup:
    Type: AWS::Logs::LogGroup
    Properties:
      LogGroupName: !Sub /ecs/${ServiceName}-logs
      RetentionInDays: 14
      Tags:
        - Key: Name
          Value: !Sub /ecs/${ServiceName}-logs
        - Key: TeamName
          Value: !Ref TeamNameValue
        - Key: Environment
          Value: !Ref EnvironmentValue

  # Blueprint that tells ECS how to run your containers
  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Family: !Sub ${ServiceName}-task
      Cpu: 256                   
      Memory: 512       
      NetworkMode: awsvpc
      RequiresCompatibilities:
        - FARGATE
      ExecutionRoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/${ExecutionRoleName}
      TaskRoleArn: !Sub arn:aws:iam::${AWS::AccountId}:role/${TaskRoleName}
      ContainerDefinitions:
        - Name: !Sub ${ServiceName}-container
          Image: !Ref ImageARN
          PortMappings:
            - ContainerPort: 80
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: !Ref ECSLogGroup
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: ecs
      Tags:
        - Key: Name
          Value: !Sub ${ServiceName}-task
        - Key: TeamName
          Value: !Ref TeamNameValue
        - Key: Environment
          Value: !Ref EnvironmentValue

  # Service that keeps your containers running and healthy
  ECSService:
    Type: AWS::ECS::Service
    Properties:
      Cluster: !Ref ECSCluster
      ServiceName: !Sub ${ServiceName}-service
      TaskDefinition: !Ref TaskDefinition
      LaunchType: FARGATE
      DesiredCount: 1
      PropagateTags: SERVICE
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: ENABLED   
          Subnets: !Split 
            - ","
            - !Ref PublicSubnetIds
          SecurityGroups:
            - !Ref SecurityGroup
      LoadBalancers:
        - ContainerName: !Sub ${ServiceName}-container 
          ContainerPort: 80
          TargetGroupArn: !Ref TargetGroupArn
      Tags:
        - Key: Name
          Value: !Sub ${ServiceName}-service
        - Key: TeamName
          Value: !Ref TeamNameValue
        - Key: Environment
          Value: !Ref EnvironmentValue

  # Defines scaling limits for your containers (1-5 tasks)
  AutoScalingTarget:
    Type: AWS::ApplicationAutoScaling::ScalableTarget
    DependsOn: ECSService
    Properties:
      ServiceNamespace: ecs
      ResourceId: !Sub "service/${ECSCluster}/${ServiceName}-service"
      ScalableDimension: ecs:service:DesiredCount
      MinCapacity: 1          
      MaxCapacity: 5

  # Target tracking scaling policy - maintains CPU around 40%
  AutoScalingPolicy:
    Type: AWS::ApplicationAutoScaling::ScalingPolicy
    Properties:
      PolicyName: !Sub ${EnvironmentValue}-${ServiceName}-target-tracking
      PolicyType: TargetTrackingScaling
      ScalingTargetId: !Ref AutoScalingTarget
      TargetTrackingScalingPolicyConfiguration:
        TargetValue: 40.0                    # Target 40% CPU utilization
        PredefinedMetricSpecification:
          PredefinedMetricType: ECSServiceAverageCPUUtilization
        ScaleOutCooldown: 120               # Wait 2 minutes before scaling up
        ScaleInCooldown: 300                # Wait 5 minutes before scaling down
        DisableScaleIn: false

  # Email notification system for alerts
  AlertTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: !Sub ${EnvironmentValue}-${ServiceName}-alerts
      DisplayName: !Sub "${ServiceName} ECS Alerts"
      Tags:
        - Key: Name
          Value: !Sub ${EnvironmentValue}-${ServiceName}-alerts
        - Key: TeamName
          Value: !Ref TeamNameValue
        - Key: Environment
          Value: !Ref EnvironmentValue

  # Connects your email to the alert system
  AlertSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      Protocol: email
      TopicArn: !Ref AlertTopic
      Endpoint: !Ref AlertEmail

  # Alert when CPU is high at maximum capacity
  CriticalCPUAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub ${EnvironmentValue}-${ServiceName}-CriticalCPU-AtMaxCapacity
      AlarmDescription: !Sub "CRITICAL: ${ServiceName} CPU >70% at max capacity (5 tasks)"
      MetricName: CPUUtilization
      Namespace: AWS/ECS
      Statistic: Average
      Period: 60
      EvaluationPeriods: 2
      Threshold: 70
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
        - !Ref AlertTopic
      Dimensions:
        - Name: ServiceName
          Value: !Sub ${ServiceName}-service
        - Name: ClusterName
          Value: !Ref ECSCluster

  # Alert when you reach maximum number of containers
  MaxCapacityAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: !Sub ${EnvironmentValue}-${ServiceName}-MaxCapacity-Reached
      AlarmDescription: !Sub "WARNING: ${ServiceName} reached maximum capacity (5 tasks)"
      MetricName: RunningTaskCount
      Namespace: AWS/ECS
      Statistic: Maximum
      Period: 60
      EvaluationPeriods: 1
      Threshold: 5
      ComparisonOperator: GreaterThanOrEqualToThreshold
      AlarmActions:
        - !Ref AlertTopic
      Dimensions:
        - Name: ServiceName
          Value: !Sub ${ServiceName}-service
        - Name: ClusterName
          Value: !Ref ECSCluster

FastAPI Application with Testing Endpoints

Here's my FastAPI application with endpoints designed to test different scenarios:

# source/app.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import logging
import time
import hashlib
import boto3
import threading
import multiprocessing

app = FastAPI()
logger = logging.getLogger("ecs_service")
active_requests = 0

# Initialize ECS client for monitoring
try:
    ecs_client = boto3.client('ecs')
except Exception as e:
    logger.warning(f"Could not initialize ECS client: {e}")
    ecs_client = None

class SubmitData(BaseModel):
    name: str = "User"

@app.get("/")
def home():
    """Simple welcome message"""
    return "Hello from ECS Fargate Service!"

@app.get("/api/health")
def health():
    """Health check endpoint for load balancer"""
    return {"status": "healthy"}

@app.post("/api/submit")
def api_submit(data: SubmitData):
    """Accepts user data and returns personalized message"""
    logger.info(f"Data received: {data.model_dump()}")
    return {"message": f"Happy learning, {data.name}!", "data": data.model_dump()}

@app.get("/api/load")
def generate_load():
    """Generates heavy CPU load for 60 seconds to test auto-scaling"""
    def cpu_intensive_task():
        start_time = time.time()
        while time.time() - start_time < 60:
            for _ in range(10000):
                hashlib.sha256(str(time.time()).encode()).hexdigest()
                sum(range(1000))

    # Use multiple threads to maximize CPU usage
    cpu_count = multiprocessing.cpu_count()
    threads = []
    for _ in range(cpu_count * 2):
        thread = threading.Thread(target=cpu_intensive_task)
        thread.start()
        threads.append(thread)

    for thread in threads:
        thread.join()

    logger.info("CPU load generation completed")
    return {"status": "load_generated", "duration": "60s", "threads": cpu_count * 2}

@app.get("/api/quickload")
def quick_load():
    """Generates 10-second CPU burst for quick scaling tests"""
    def burst_task():
        start_time = time.time()
        while time.time() - start_time < 10:
            for _ in range(50000):
                hashlib.sha256(str(time.time()).encode()).hexdigest()

    cpu_count = multiprocessing.cpu_count()
    threads = []
    for _ in range(cpu_count * 3):
        thread = threading.Thread(target=burst_task)
        thread.start()
        threads.append(thread)

    for thread in threads:
        thread.join()

    return {"status": "burst_completed", "duration": "10s"}

@app.get("/api/scalinginfo")
def get_scaling_info():
    """Returns current ECS service scaling status"""
    if not ecs_client:
        return {"error": "ECS client not available"}

    try:
        response = ecs_client.describe_services(
            cluster='learner-cluster',
            services=['learner-svc-service']
        )

        if response['services']:
            service = response['services'][0]
            return {
                "cluster": "learner-cluster",
                "service": "learner-svc-service",
                "desired_count": service['desiredCount'],
                "running_count": service['runningCount'],
                "pending_count": service['pendingCount'],
                "status": service['status'],
                "active_requests": active_requests
            }
        else:
            return {"error": "Service not found"}

    except Exception as e:
        logger.error(f"Error getting scaling info: {str(e)}")
        return {"error": "Unable to fetch scaling information"}

@app.get("/api/error")
def trigger_error():
    """Triggers 500 error for testing error handling"""
    logger.error("Intentional error triggered")
    raise HTTPException(status_code=500, detail="Internal server error")

@app.get("/api/notfound")
def not_found():
    """Triggers 404 error for testing not found responses"""
    logger.warning("Resource not found")
    raise HTTPException(status_code=404, detail="Resource not found")

Requirements File

# source/requirements.txt
fastapi==0.104.1
uvicorn==0.24.0
boto3==1.34.0
pydantic==2.5.0

Deployment Commands

Important: Before deploying the ECS service, we need to build and push our container image using the CodeBuild project we set up in Day 4.

Deploy in this order:

# 1. First, build and push your container image
# This uses the CodeBuild project from Day 4
aws codebuild start-build --project-name learner-project

# Wait for the build to complete (check in AWS Console or CLI)
# This typically takes 2-3 minutes

# 2. Deploy ECS service with auto-scaling
aws cloudformation deploy \
  --template-file infra/ecs/ecs_service.yaml \
  --stack-name AWSLearner-ECS-Stack \
  --capabilities CAPABILITY_NAMED_IAM 

Auto-Scaling Scenario's

1. Check Current Status

curl http://your-alb-url/api/scalinginfo

Response:

{
  "desired_count": 1,
  "running_count": 1,
  "pending_count": 0,
  "status": "ACTIVE"
}

2. Trigger Heavy Load

curl http://your-alb-url/api/load

This creates 60 seconds of intense CPU load. Watch CloudWatch metrics to see:

  • CPU utilization spike to 80-90%
  • Auto-scaling trigger after 2 minutes
  • New containers start (desired_count increases)
  • Load distributes across containers
  • CPU drops back to target 40%

3. Quick Burst Test

curl http://your-alb-url/api/quickload

Perfect for testing rapid scaling response with a 10-second burst.

Postman Testing

  • GET /api/health - Health check status

health

  • POST /api/submit - Data submission with JSON body

submit

  • GET /api/scalinginfo - Current container status

scalinginfo

  • GET /api/load - Load generation response

apiload

  • GET /api/quickload - Quick burst response

quickload

  • GET /api/error - Error handling test

error

  • GET /api/notfound - 404 error test

notfound

Note: To see auto-scaling in action, you'll need to hit the load endpoints multiple times or use multiple browser tabs/terminals simultaneously to generate enough traffic that triggers the CPU threshold.

What Each Endpoint Does

Endpoint Purpose Response Time
/api/health Load balancer health check Instant
/api/submit Data processing test Instant
/api/load Sustained CPU load (60s) 60 seconds
/api/quickload CPU burst (10s) 10 seconds
/api/scalinginfo Current container status Instant
/api/error Error handling test Instant
/api/notfound 404 error test Instant

Auto-Scaling Behavior

How it works:

  • Target: Maintains 40% CPU utilization
  • Scale Out: Adds containers when CPU > 40% (2-minute cooldown)
  • Scale In: Removes containers when CPU < 40% (5-minute cooldown)
  • Limits: 1-5 containers
  • Alerts: Email notifications at critical thresholds

Scaling Timeline:

  1. 0-2 minutes: High CPU detected, evaluation period
  2. 2-4 minutes: New container launching
  3. 4-6 minutes: Container healthy, receiving traffic
  4. 6+ minutes: Load distributed, CPU normalizes

Key Learnings

  1. Target tracking scaling is much smarter than threshold-based scaling
  2. Cooldown periods prevent rapid scaling that could cause instability
  3. Email alerts provide peace of mind without constant monitoring
  4. Load testing endpoints are essential for validating your setup
  5. ECS Fargate eliminates server management completely

What's Next in This Series?

In this comprehensive series, we've learned how to deploy a complete containerized application from VPC to ECS service using Fargate. We covered:

  • VPC Setup with multi-AZ networking
  • Security Groups and IAM roles
  • ECR Repository for container images
  • CodeBuild Pipeline for CI/CD
  • Application Load Balancer for traffic distribution
  • ECS Service with intelligent auto-scaling

The auto-scaling system feels really robust now. I can throw traffic at it, watch it scale intelligently, and get notified if anything needs attention. Perfect foundation for a production workload!

Complete Day 5 Learning Summary

What we accomplished in Day 5:

Infrastructure Built

  • ECS Service with Fargate launch type (256 CPU, 512 MB RAM)
  • Task Definition with proper IAM roles and logging
  • Auto-Scaling Target (1-5 containers) with target tracking policy
  • CloudWatch Alarms for critical CPU and max capacity alerts
  • SNS Email Notifications for real-time monitoring

Application Features

  • 7 FastAPI Endpoints for comprehensive testing
  • Load Testing Capabilities (60s sustained + 10s burst)
  • Real-time Monitoring with ECS service status
  • Error Handling and health checks
  • Structured Logging to CloudWatch

Auto-Scaling Intelligence

  • Target Tracking: Maintains 40% CPU utilization
  • Smart Cooldowns: 2min scale-out, 5min scale-in
  • Proportional Scaling: Responds to load intensity
  • Email Alerts: Critical thresholds and capacity warnings

Key Takeaway: We now have a fully automated, scalable containerized application that can handle real-world traffic patterns while maintaining cost efficiency and operational visibility.

💻 About Me

Hi! I'm Utkarsh, a Cloud Specialist & AWS Community Builder who loves turning complex AWS topics into fun chai-time stories

👉 Explore more


This content originally appeared on DEV Community and was authored by Utkarsh Rastogi


Print Share Comment Cite Upload Translate Updates
APA

Utkarsh Rastogi | Sciencx (2025-08-23T13:13:56+00:00) AWS VPC to ECS – Day 5: ECS Service with Smart Auto-Scaling. Retrieved from https://www.scien.cx/2025/08/23/aws-vpc-to-ecs-day-5-ecs-service-with-smart-auto-scaling/

MLA
" » AWS VPC to ECS – Day 5: ECS Service with Smart Auto-Scaling." Utkarsh Rastogi | Sciencx - Saturday August 23, 2025, https://www.scien.cx/2025/08/23/aws-vpc-to-ecs-day-5-ecs-service-with-smart-auto-scaling/
HARVARD
Utkarsh Rastogi | Sciencx Saturday August 23, 2025 » AWS VPC to ECS – Day 5: ECS Service with Smart Auto-Scaling., viewed ,<https://www.scien.cx/2025/08/23/aws-vpc-to-ecs-day-5-ecs-service-with-smart-auto-scaling/>
VANCOUVER
Utkarsh Rastogi | Sciencx - » AWS VPC to ECS – Day 5: ECS Service with Smart Auto-Scaling. [Internet]. [Accessed ]. Available from: https://www.scien.cx/2025/08/23/aws-vpc-to-ecs-day-5-ecs-service-with-smart-auto-scaling/
CHICAGO
" » AWS VPC to ECS – Day 5: ECS Service with Smart Auto-Scaling." Utkarsh Rastogi | Sciencx - Accessed . https://www.scien.cx/2025/08/23/aws-vpc-to-ecs-day-5-ecs-service-with-smart-auto-scaling/
IEEE
" » AWS VPC to ECS – Day 5: ECS Service with Smart Auto-Scaling." Utkarsh Rastogi | Sciencx [Online]. Available: https://www.scien.cx/2025/08/23/aws-vpc-to-ecs-day-5-ecs-service-with-smart-auto-scaling/. [Accessed: ]
rf:citation
» AWS VPC to ECS – Day 5: ECS Service with Smart Auto-Scaling | Utkarsh Rastogi | Sciencx | https://www.scien.cx/2025/08/23/aws-vpc-to-ecs-day-5-ecs-service-with-smart-auto-scaling/ |

Please log in to upload a file.




There are no updates yet.
Click the Upload button above to add an update.

You must be logged in to translate posts. Please log in or register.