Cloud & MLOps ☁️
Notable Services
Elastic Container Service (ECS)

The ECS Cluster

Here, I explain how to use ECS with the example of deploying a servers with a Redis backend. The server is a Node.js application that uses the library to provide a real-time communication between the client and the server. The Redis backend is used as distributed cache.

ECS Cluster Basics

An ECS cluster is a logical grouping of EC2 instances, also known in this context as ECS instances or container instances. These instances are spread on several availability zones. Each EC2 instances running in an ECS cluster is running an ECS container agent. The latter communicates with ECS to provide instances information and manage the containers running on its instance.


ECS can run a Task (runs until it stops/exits) or it can run a Service ( A long lived task that runs all the time). ECS will launch a new instance automatically. Service can be configured with load balancer and target group. It’s the perfect match for running web/REST applications

We will be running services with an Application Load Balancer (ALB), so ECS agent will also take care to register the service, the container instance and the assigned ephemeral port to the ALB (acting as a server-side service discovery).

Keep in mind that a container instance cannot span on several EC2 instances: for instance if you choose t2.micro but you need more memory for your container, the container will not span on several t2.micro instances. Always think that something might go wrong during the execution of the cluster, so you need fallback instance(s): 2 instances will be provisioned.

So we create the task and the running service based on this task definition.

Auto Scaling is used for automatic scaling up and scaling down of our Capacity providers. The Application Load balancer on the other hand is used to distribute the incoming traffic across multiple targets.

In summary:

  • This is just a logical grouping of EC2 instances (known here also as ECS instances or container instances)
  • These instances are spread on several availability zones
  • Each EC2 runs 1 ECS container agent
  • This Agent communicates with ECS to provide instance information
  • This agent also manages the multiple containers running on its instance

Scaling with the ECS Cluster

How are we going to scale the application? Containers enable consistency, and we can run the same images in dev, test and prod. AWS will help you with your high availability challenges and expansion around the world thanks to the regions available.

Also when you start a business or a new product, it’s very hard to estimate the load. With AWS you don’t have to worry about that. Just use on-demand instances when the load increase then adapt your infrastructure to your real need and optimize your cost with reserved instances later down the line.

We use ECS here while avoiding Fargate as it is too expensive for long running Socket Servers. We also know we will have a minimum consistent throughput so a serverless architecture makes less sense here.

To facilitate ECS Cluster setup, we select the ECS-Optimized AMI by default. This AMI is specifically designed and optimized for ECS, providing integration with other AWS services. Additionally, we can include an SSH key pair to help us in dev for testing and debugging of the underlying EC2 instances.

Once the cluster is created, we can observe the running instances in ECS, though no tasks are running at this stage.

ECS Infrastructure

Amazon ECS can manage the scaling of Amazon EC2 instances that are registered to your cluster. This is referred to as Amazon ECS cluster auto scaling. This is done by using an Amazon ECS Auto Scaling group capacity provider with managed scaling turned on. When you use an Auto Scaling group capacity provider with managed scaling turned on, Amazon ECS creates two custom CloudWatch metrics and a target tracking scaling policy that attaches to your Auto Scaling group. Amazon ECS then manages the scale-in and scale-out actions of the Auto Scaling group based on the load your tasks put on your cluster.

To ensure successful registration of the EC2 instances with our ECS cluster, we include the following user data in the launch configuration:

echo ECS_CLUSTER=socket-ecs-cluster >> /etc/ecs/ecs.config;echo ECS_BACKEND_HOST= >> /etc/ecs/ecs.config;

The Task Definition

Next, we create a task definition - it is just a simple metadata description of our Docker image and its resource requirements (CPU and memory). We also define the environment variables and the container port on which our server is listening.

Once our Docker image is available on our ECR registry, we can create a task definition using this image. Simply give a name to the task definition, specify the target image, tune the CPU and memory reservation, add the environment variables & the container port on which your image is listening.

ASG will control the ephemeral ports; note that when you don’t specify the host port (hostPort) or set it to 0, the container will automatically receive a port in the allowed ephemeral port range. It allows running several containers on the same ECS instance without conflict.

A simple one may look like:

  "family": "Socket",
  "containerDefinitions": [
      "name": "socket-container-service",
      "image": "",
      "cpu": 512,
      "memoryReservation": 512,
      "portMappings": [
          "containerPort": 3000,
          "hostPort": 3000,
          "protocol": "tcp"
      "essential": true,
      "environment": [
          "name": "REDIS_ENDPOINT",
          "value": "redis://"
      "mountPoints": [],
      "volumesFrom": [],
      "dockerLabels": {},
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "production-service-Socket",
          "awslogs-region": "eu-west-2",
          "awslogs-stream-prefix": "Socket"
  "taskRoleArn": "arn:aws:iam::xxx",
  "executionRoleArn": "arn:aws:iam::xxx",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["EC2"]

Configuring Application Load Balancer and Target Group

Now we have an empty cluster with provisioned ECS instances and a task definition describing a Docker image to run. Before creating the running service based on this task definition, we will set up an Application Load Balancer (ALB) and a Target Group in order to balance the load across possible various services instances.

The load balancer is placed into the public subnets, so that traffic from the internet can reach the load balancer directly via the internet gateway. We also add a HTTPS listener with SSL cert to the load balancer so we can deploy it to a certain domain.

The target group is used for keeping track of all the tasks, and what IP addresses / port numbers they have. You can query it yourself, to use the addresses yourself, but most often this target group is just connected to an application load balancer, or network load balancer, so it can automatically distribute traffic across all the targets.

We Create rules to forward HTTP traffic to the service's target group. Enable autoscaling for this service by Create scaling policies for the service

One more step for this application to work properly is to configure the load balancer to use sticky sessions. This is necessary because makes one request to set a connection ID, and a subsequent upgrade request to establish the long lived websocket connection. These two requests must go to the same backend process, but by default the load balancer will send the two requests to random processes, so the connection will be unstable. We can fix this by setting the stickiness.enabled flag on the target group for the service.

If we want a secure domain for out ALB endpoint. We then go to Route 53 to create a CNAME record for the ALB. We then go to the ALB and create a listener for port 443 and attach the SSL certificate.

We also need to specify how to ensure the instance is still up for the health checks. By default, the Socket.IO server exposes a client bundle at / We can use this endpoint for the HTTP health checks to perform GET requests to this configured path.


When creating the ALB, AWS Console proposes you to register targets. Ignore this step as ECS will do it for you automatically.

Once set up, we can see all our container instances in the target group:

Target Group

We forward traffic to this target group using the ALB listener:

ALB Listener

Notes on Cloudfront:

If you want to put CloudFront in front of the load balancer you need to enable one other setting for this to work properly. By default CloudFront does not forward cookies from the client to the backend, but the ALB stickiness operates by using a cookie, so you need to go to the settings and change the “Forward Cookies” setting to “All”

This will allow the traffic from clients to be properly forwarded through CloudFront, to the Application Load Balancer, and finally to multiple running copies of the server on the backend.

The ECS Service

The service is a resource which allows you to run multiple copies of a type of task, and gather up their logs and metrics, as well as monitor the number of running tasks and replace any that have crashed. Create an ECS service and attach these tasks to the target group/load balancer previously created.

Sine we selected an SSH key pair during the ECS cluster creation, you should be able to connect the ECS instances and run docker ps to list the running containers.

We can also configure Task Placement - this lets you customize how tasks are placed on instances within your cluster. Different placement strategies are available to optimize for availability and efficiency. We use the AZ balanced spread - this template will spread tasks across availability zones and within the Availability Zone spread tasks across instances.

Notes on Scaling

ALB is very interesting for those running micro-services as a single ALB instance is able to manage several target groups/micro-services. Also, when a service instance is added/removed, the target group and the ALB detect the change almost instantly.

Also we can add different paths to our ALB to route traffic to different services. We can run up a seperate service, and set them up as a different target group managed by ECS. Both are running on the same port (80) and take advantage of path-based routing to route the traffic to the right micro-service.

ECS should scale based on 2 variables: the number of EC2 instances and the number of running services/tasks on these instances. Overall, there are two types of scaling here:

  • Cluster scaling: the number of ECS instances providing computing power to our cluster i.e the compute power of our capacity provider
  • Service scaling: the number of containers running for a particular service/task definition in the cluster

For cluster scaling, we can add policies in the automatic scaling tab of our ASG. For instance, we can add a policy to scale up the cluster when the CPU utilization is above 70% for 5 minutes. As load increases on the service and CPU usage increases the application will automatically scale up and run more instances of itself,

ASG Scaling

we should apply the same principles on the ECS services with Service auto scaling. This will automatically adjust your service's desired count up and down within a specified range in response to CloudWatch alarms. You can modify your service auto scaling configuration at any time to meet the needs of your application.

Service Scaling

Blue/Green Deployment

In a blue/green deployment, you can launch the new version of your application alongside the old version and test the new version before you reroute traffic. You can also monitor the deployment process and rapidly roll back if there is an issue. From this tutorial (opens in a new tab)

Blue green deployment is a classical pattern for zero downtime deployment and to reduce the risk of each deployment. The blue version (version n) currently used by your clients, and the green version (version n+1), the new version of your application.

Once you are satisfied with the green version, you can reroute the traffic to the instance(s) of the green version. If something goes wrong, you can quickly revert your changes and reroute the traffic back the blue version (also known as fast rollback).

With ECS, you can run the green version of your application on the same cluster as the blue version. Each version will have its own target group. If needed (and enabled), the cluster will auto scale by adding ECS instances.

Once both of them are up and running, you can reconfigure the application load balancer in order to switch the traffic from version blue to green and vice versa. Once switched to the new version, you can continue to run the previous one for safety reason during several days. Then you disable the old version of the service, after a few moments the cluster will scale-in the number of ECS instances.