Cloud & MLOps ☁️
Deploying to AWS

Deploying to AWS

Create an Amazon EKS cluster with GPU nodes specifically for KubeRay

1. Create a Kubernetes cluster on Amazon EKS

This follows the first two steps in this AWS documentation (opens in a new tab).

  1. Create an Amazon VPC with public and private subnets that meets Amazon EKS requirements
aws cloudformation create-stack \
  --region eu-west-2 \
  --stack-name ray-eks-vpc-stack \
  1. Create a cluster IAM role and attach the required Amazon EKS IAM managed policy to it. Kubernetes clusters managed by Amazon EKS make calls to other AWS services on your behalf to manage the resources that you use with the service.
  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Principal": {
        "Service": ""
      "Action": "sts:AssumeRole"
  • Create the cluster role:
aws iam create-role \
  --role-name RayEKSRole \
  --assume-role-policy-document file://"eks-cluster-role-trust-policy.json"

And attach the required Amazon EKS managed IAM policy to the role.

aws iam attach-role-policy \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKSClusterPolicy \
  --role-name RayEKSRole
  1. Add the cluster in the UI. Choose the Cluster Service Role we just created, as well as the VPC and subnets we created in the previous steps. The rest of the options can be left as the default. In the Review tab, you may want to take a note of the Subnets. Click Create.

2. Configure your PC to communicate with EKS via kubectl

Create a kubeconfig file for the EKS cluster. The settings in this file enable the kubectl CLI to communicate with the cluster. Create or update a kubeconfig file for your cluster by running the following command:

aws eks update-kubeconfig --region eu-west-2 --name ray-cluster

Now running kubectl get svc should return:

kubernetes   ClusterIP   <none>        443/TCP   3m45s

You can also see which credentials kubectl uses with the cat ~/.kube/config command.

3. Create Nodes

You can create a cluster with either Fargate or Managed nodes node types. To learn more about each type, see Amazon EKS nodes (opens in a new tab). For my use case, I create a managed node group, specifying the subnets and node IAM role created in previous steps.

Create a node IAM role and attach the required Amazon EKS IAM managed policy to it. The Amazon EKS node kubelet daemon makes calls to AWS APIs on your behalf. Nodes receive permissions for these API calls through an IAM instance profile and associated policies.

  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Principal": {
        "Service": ""
      "Action": "sts:AssumeRole"

Creating the IAM Role:

aws iam create-role \
  --role-name RayEKSNodeRole \
  --assume-role-policy-document file://"node-role-trust-policy.json"

Attaching the required managed IAM policies to the role:

aws iam attach-role-policy \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy \
  --role-name RayEKSNodeRole
aws iam attach-role-policy \
  --policy-arn arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly \
  --role-name RayEKSNodeRole
aws iam attach-role-policy \
  --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy \
  --role-name RayEKSNodeRole

Now create the node groups in the EKS UI. Typically, avoid running GPU workloads on the Ray head. Create a CPU node group for all Pods except Ray GPU workers, such as the KubeRay operator, Ray head, and CoreDNS Pods. The common configuration that works for most KubeRay examples in the docs:

You may also want to create a GPU node group for Ray GPU workers.

  • AMI type: Bottlerocket NVIDIA (BOTTLEROCKET_x86_64_NVIDIA)
  • Instance type: g5.xlarge (opens in a new tab) (1 GPU; 24 GB GPU Memory; 4 vCPUs; 16 GB RAM)
  • Disk size: 1024 GB
  • Desired size: 1, Min size: 0, Max size: 1

For GPU worker nodes refer back to the docs (opens in a new tab) for more config settings.

4. Deploy RayService Custom Resource (CR)

Deploy the Ray service the same way we did when we deployed locally.

5. Installing the AWS Load Balancer controller

When we deployed locally, we used port forwarding, and that was okay for dev. To access EKS from an endpoint, it is better to access it by configuring a Kubernetes ingress (opens in a new tab). The first step is to follow the installation instructions (opens in a new tab) to set up the AWS Load Balancer controller (opens in a new tab). AWS Load Balancer Controller is a controller to help manage Elastic Load Balancers for a Kubernetes cluster. Note that the repository maintains a webpage for each release.

The controller runs on the worker nodes, so it needs access to the AWS ALB/NLB APIs with IAM permissions. The reference IAM policies contain the following permissive configuration:

    "Effect": "Allow",
    "Action": [
    "Resource": "*"

Create an IAM OIDC provider.

eksctl utils associate-iam-oidc-provider \
    --region eu-west-2 \
    --cluster ray-cluster \

Next, download an IAM policy for the LBC. For most regions, use

curl -o iam-policy.json

Then create an IAM policy named AWSLoadBalancerControllerIAMPolicy:

aws iam create-policy \
    --policy-name AWSLoadBalancerControllerIAMPolicy \
    --policy-document file://iam-policy.json

Create an IAM role and Kubernetes ServiceAccount for the LBC as detailed in the docs (opens in a new tab). Use the ARN from the previous step:

eksctl create iamserviceaccount \
--cluster=ray-cluster \
--namespace=kube-system \
--name=aws-load-balancer-controller \
--attach-policy-arn=arn:aws:iam::123456789012:policy/AWSLoadBalancerControllerIAMPolicy \
--override-existing-serviceaccounts \
--region eu-west-2 \

Next use the Helm chart to install the controller:

helm repo add eks
helm install aws-load-balancer-controller eks/aws-load-balancer-controller -n kube-system --set clusterName=ray-cluster --set serviceAccount.create=false --set

6. AWS Application Load Balancer (ALB) Ingress

Why Ingress?

Ingress is a built-in Kubernetes resource type that works in combination with Services to provide access to the Pods behind these Services. It defines a set of rules to route incoming HTTP/HTTPS traffic (it doesn’t support TCP/UDP) to backend Services and a default backend if none of the rules match. Each rule can define the desired host, path, and backend to receive the traffic if there is a match.

The way in which a Service resource is exposed is controlled via its spec.type setting, with the following being relevant to our discussion:

  • ClusterIP (as in the example above), which assigns it a cluster-private virtual IP and is the default
  • NodePort, which exposes the above ClusterIP via a static port on each cluster node
  • LoadBalancer, which automatically creates a ClusterIP, sets the NodePort, and indicates that the cluster’s infrastructure environment (e.g., cloud provider) is expected to create a load balancing component to expose the Pods behind the Service

To allow external access, LoadBalancer type is usually the preferred solution, as it combines the other options with load balancing capabilities and, possibly, additional features. In AWS these features, depending on the load balancer type, include Distributed Denial of Service (DDoS) (opens in a new tab) protection with the AWS WAF service (opens in a new tab), certificate management with AWS Certificate Manager (opens in a new tab) and many more.

Ingress in AWS

AWS Application Load Balancer (ALB) Ingress support on AWS EKS is provided and is detailed in the docs (opens in a new tab). Take the demo ray-service-alb-ingress.yaml for example:

kind: Ingress
  name: ray-cluster-ingress
  annotations: internet-facing Environment=dev,Team=test subnet-0c3c84c06f8271d9f, subnet-0e82ba70b474d8b1a ip
    # Health Check Settings. Health check is needed for
    # ALB to route traffic to the healthy pod. HTTP traffic-port /-/routes
  ingressClassName: alb
    - http:
          - path: /
            pathType: Prefix
                name: rayservice-dummy-serve-svc # Serve service
                  number: 8000 # default HTTP port number for serving requests

Make sure that:

  1. Annotation
    • Include at least two subnets.
    • One Availability Zone (e.g. us-west-2a) can only have at most 1 subnet.
    • In this example, you need to select public subnets (subnets that "Auto-assign public IPv4 address" is Yes on AWS dashboard)
  2. Set the name of head pod service to what you want to direct to.
    • rayservice-sample-serve-svc is HA in general. It does traffic routing among all the workers which have Serve deployments and always tries to point to the healthy cluster, even during upgrading or failing cases.

Apply it and check the status:

kubectl apply -f ray-service-alb-ingress.yaml
kubectl describe ingress ray-cluster-ingress

You should now be able to check ALB on AWS (EC2 -> Load Balancing -> Load Balancers). The name of the ALB should be along the lines of k8s-default-<name>. Check the ALB DNS Name to interact with the newly deployed Ray API!

7. Autoscaling

3 levels of autoscaling in KubeRay:

  • Ray actor/task: Some Ray libraries, like Ray Serve, can automatically adjust the number of Serve replicas (i.e., Ray actors) based on the incoming request volume.
  • Ray node (i.e., Ray Pods): Ray Autoscaler automatically adjusts the number of Ray nodes (i.e., Ray Pods) based on the resource demand of Ray actors/tasks.
  • Kubernetes node: If the Kubernetes cluster lacks sufficient resources for the new Ray Pods that the Ray Autoscaler creates, the Kubernetes Autoscaler can provision a new Kubernetes node. You must configure the Kubernetes Autoscaler yourself.

You can configure autoscaling for your Serve application by setting the autoscaling field in the Serve config. Learn more about the configuration options in the Serve Autoscaling Guide (opens in a new tab).

To enable autoscaling in a KubeRay Cluster, you need to set enableInTreeAutoscaling to True. Additionally, there are other options available to configure the autoscaling behavior. For further details, please refer to the documentation (opens in a new tab).

For EKS, you can enable Kubernetes cluster autoscaling by utilizing the Cluster Autoscaler. For detailed information, see Cluster Autoscaler on AWS (opens in a new tab). To understand the relationship between Kubernetes autoscaling and Ray autoscaling, see Ray Autoscaler with Kubernetes Cluster Autoscaler (opens in a new tab).

8. Zero Downtime Updates

This section is a stub

To update the Ray service, you can simply update the image tag in the RayService CR. The Ray operator will automatically update the Ray service. For more information, see Zero Downtime Updates (opens in a new tab).

9. Fault Tolerance

This section is a stub

GCS fault tolerance in KubeRay (opens in a new tab)

See Add End-to-End Fault Tolerance (opens in a new tab) to learn more about Serve’s failure conditions and how to guard against them.

10. Log Persistence

Learn about the Ray Serve logs (opens in a new tab) and how to persistent logs (opens in a new tab) on Kubernetes from the Ray Docs. In general, Logs (both system and application logs) are useful for troubleshooting Ray applications and Clusters. For example, you may want to access system logs if a node terminates unexpectedly.

Similar to Kubernetes, Ray does not provide a native storage solution for log data. Users need to manage the lifecycle of the logs by themselves. This page provides instructions on how to collect logs from Ray Clusters that are running on Kubernetes.

Ray log directory

By default, Ray writes logs to files in the directory /tmp/ray/session_*/logs on each Ray pod's file system, including application and system logs.

Log processing tools

There are a number of open source log processing tools available within the Kubernetes ecosystem. This page shows how to extract Ray logs using Fluent Bit (opens in a new tab). Other popular tools include Vector (opens in a new tab), Fluentd (opens in a new tab), Filebeat (opens in a new tab), and Promtail (opens in a new tab).

Log collection strategies

Collect logs written to a pod's filesystem using one of two logging strategies: sidecar containers or daemonsets. Read more about these logging patterns in the Kubernetes documentation (opens in a new tab).

Setting up logging sidecars with Fluent Bit

This is an example of the sidecar strategy. You can process logs by configuring a log-processing sidecar for each Ray pod. Ray containers should be configured to share the /tmp/ray directory with the logging sidecar via a volume mount.

Note that if you wanted to go with the Daemonset strategy, it is possible to collect logs at the Kubernetes node level. To do this, one deploys a log-processing daemonset onto the Kubernetes cluster's nodes. With this strategy, it is key to mount the Ray container's /tmp/ray directory to the relevant hostPath.

You can configure the sidecar to do either of the following:

  • Stream Ray logs to the sidecar's stdout.
  • Export logs to an external service.

See full config for a single-pod RayCluster with a logging sidecar here (opens in a new tab) and a full example here (opens in a new tab)

  1. Create a ConfigMap with configuration for Fluent Bit. Below is a minimal example, which tells a Fluent Bit sidecar to (1) Tail Ray logs and (2) Output the logs to the container’s stdout.
apiVersion: v1
kind: ConfigMap
  name: fluentbit-config
  fluent-bit.conf: |
        Name tail
        Path /tmp/ray/session_latest/logs/*
        Tag ray
        Path_Key true
        Refresh_Interval 5
        Name stdout
        Match *

A few notes on the above config:

  • In addition to streaming logs to stdout, you can use an [OUTPUT] clause to export logs to any storage backend (opens in a new tab) supported by Fluent Bit.
  • The Path_Key true line above ensures that file names are included in the log records emitted by Fluent Bit.
  • The Refresh_Interval 5 line asks Fluent Bit to refresh the list of files in the log directory once per 5 seconds, rather than the default 60. The reason is that the directory /tmp/ray/session_latest/logs/ does not exist initially (Ray must create it first). Setting the Refresh_Interval low allows us to see logs in the Fluent Bit container's stdout sooner.
  1. For each pod template in our RayCluster CR, we need to add two volumes: One volume for Ray's logs and another volume to store Fluent Bit configuration from the ConfigMap.
  - name: ray-logs
    emptyDir: {}
  - name: fluentbit-config
      name: fluentbit-config
  1. Add the following volume mount to the Ray container’s configuration:
  - mountPath: /tmp/ray
    name: ray-logs
  1. Finally, add the Fluent Bit sidecar container to each Ray pod config in your RayCluster CR:
- name: fluentbit
  image: fluent/fluent-bit:1.9.6
  # These resource requests for Fluent Bit should be sufficient in production.
      cpu: 100m
      memory: 128Mi
      cpu: 100m
      memory: 128Mi
    - mountPath: /tmp/ray
      name: ray-logs
    - mountPath: /fluent-bit/etc/fluent-bit.conf
      subPath: fluent-bit.conf
      name: fluentbit-config

Mounting the ray-logs volume gives the sidecar container access to Ray's logs. The fluentbit-config volume gives the sidecar access to logging configuration.


To examine the FluentBit sidecar’s STDOUT to see logs for Ray’s component processes, determine the Ray pod’s name with:

kubectl get pod | grep raycluster-complete-logs
# Substitute the name of your Ray pod.
kubectl logs raycluster-complete-logs-head-xxxxx -c fluentbit

Send logs and metrics to Amazon CloudWatch

To output the logs to Cloudwatch (opens in a new tab), the following AWS IAM permissions are required to use this plugin:

  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Action": [
      "Resource": "*"

In your main configuration file append the following Output section:

    Name cloudwatch_logs
    Match   *
    region us-east-1
    log_group_name fluent-bit-cloudwatch
    log_stream_prefix from-fluent-bit-
    auto_create_group On

10. Observability

You can use the Ray State CLI on the head Pod to check the status of Ray Serve applications.

export HEAD_POD=$(kubectl get pods -o --no-headers)
kubectl exec -it $HEAD_POD -- ray summary actors

11. Monitoring

Monitor your Serve application using the Ray Dashboard. To see time-series metrics, we need to set up Prometheus & Grafana. To start, clone the KubeRay repository (opens in a new tab) and checkout the master branch.

  1. KubeRay provides an script (opens in a new tab) to install the kube-prometheus-stack v48.2.1 (opens in a new tab) chart and related custom resources, including ServiceMonitor, PodMonitor and PrometheusRule, in the namespace prometheus-system automatically. Install Kubernetes Prometheus Stack via Helm chart by running:
# Path: kuberay/
# Check the installation
kubectl get all -n prometheus-system
# (part of the output)
# NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
# deployment.apps/prometheus-grafana                    1/1     1            1           46s
# deployment.apps/prometheus-kube-prometheus-operator   1/1     1            1           46s
# deployment.apps/prometheus-kube-state-metrics         1/1     1            1           46s

The ray team have made some modifications to the original values.yaml in kube-prometheus-stack chart to allow embedding Grafana panels in Ray Dashboard. See overrides.yaml (opens in a new tab) for more details.

  1. We need to edit the custom-resource.yaml to allow the Prometheus server to scrape metrics from the Ray Serve application. Note the env variables added in ray-operator/config/samples/ (opens in a new tab). Three required environment variables are defined in ray-cluster.embed-grafana.yaml. See Configuring and Managing Ray Dashboard (opens in a new tab) for more details about these environment variables.
    value: http://prometheus-grafana.prometheus-system.svc:80
    value: http://prometheus-kube-prometheus-prometheus.prometheus-system.svc:9090

Note that we do not deploy Grafana in the head Pod, so we need to set both RAY_GRAFANA_IFRAME_HOST and RAY_GRAFANA_HOST.

  • RAY_GRAFANA_HOST is used by the head Pod to send health-check requests to Grafana in the backend.
  • RAY_GRAFANA_IFRAME_HOST is used by your browser to fetch the Grafana panels from the Grafana server rather than from the head Pod.
  • Because we forward the port of Grafana to in this example, we set RAY_GRAFANA_IFRAME_HOST to
    • http:// is required.
  1. Wait until all Ray Pods are running and forward the port of the Prometheus metrics endpoint in a new terminal.
kubectl port-forward --address ${RAYCLUSTER_HEAD_POD} 8080:8080
curl localhost:8080

KubeRay exposes a Prometheus metrics endpoint in port 8080 via a built-in exporter by default. Hence, we do not need to install any external exporter.

  1. A lot of the work for collecting metrics is handled for us by the script:

    • Collect Head Node metrics with a ServiceMonitor
    • Collect Worker Node metrics with PodMonitors
      • KubeRay operator does not create a Kubernetes service for the Ray worker Pods, therefore we cannot use a Prometheus ServiceMonitor to scrape the metrics from the worker Pods. To collect worker metrics, we can use Prometheus PodMonitors CRD instead.
    • Collect custom metrics with Recording Rules
      • Recording Rules allow us to precompute frequently needed or computationally expensive PromQL expressions and save their result as custom metrics.
    • Define Alert Conditions with Alerting Rules
  2. Access Prometheus Web UI

    • Forward the port of Prometheus Web UI in the Prometheus server Pod by running:
kubectl port-forward --address prometheus-prometheus-kube-prometheus-prometheus-0 -n prometheus-system 9090:9090
  1. Access Grafana
    • Forward the port of Grafana:
kubectl port-forward --address deployment/prometheus-grafana -n prometheus-system 3000:3000
  • In production, kubectl port-forward is not recommended. We can refer to this Grafana document (opens in a new tab) for exposing Grafana behind a reverse proxy.
  • The default password is defined by grafana.adminPassword in the values.yaml (opens in a new tab) of the kube-prometheus-stack chart i.e. Username admin and Password prom-operator.
  • After logging in to Grafana successfully, we can import Ray Dashboard into Grafana via dashboard_default.json.
    • Import the default_grafana_dashboard.json file from /tmp/ray/session_latest/metrics/grafana/dashboards/ in the head Pod. You can use kubectl cp to copy the file from the head Pod to your local machine
kubectl cp rayservice-dq-raycluster-gqxb2-head-cgpkf:/tmp/ray/session_latest/metrics/grafana/dashboards/ ./aws/grafana/
  1. Run the Dashboard - The Grafana panels should be embedded in Ray Dashboard

12. Storage and Data Management

This section is a stub

13. Security & Access Control

This section is a stub

14. ECR Integration

When using Amazon ECR Images with Amazon EKS the Amazon EKS worker node IAM role must contain the following IAM policy permissions for Amazon ECR:

  "Version": "2012-10-17",
  "Statement": [
      "Effect": "Allow",
      "Action": [
      "Resource": "*"

When referencing an image from Amazon ECR, you must use the full registry/repository:tag naming for the image. For example,

15. Enabling IAM principal access to your cluster

Access to your cluster using IAM principals is enabled by the AWS IAM Authenticator for Kubernetes, which runs on the Amazon EKS control plane. The authenticator gets its configuration information from the aws-auth ConfigMap. Follow the docs (opens in a new tab) to add user roles.

# View the current configmap
cat ~/.kube/config
kubectl describe -n kube-system configmap/aws-auth
# View Roles
kubectl get roles -A
kubectl get clusterroles
# View your existing Kubernetes rolebindings or clusterrolebindings
kubectl get rolebindings -A
kubectl get clusterrolebindings
# Describe a role & RoleBinding
kubectl describe clusterrole cluster-admin
kubectl describe clusterrolebinding  cluster-admin
# View ConfigMap with eksctl
eksctl get iamidentitymapping --cluster ray-dev --region=eu-west-2

Now, edit the aws-auth ConfigMap. Add a mapping for a role. Replace my-role with your role name. Replace:

  1. eks-console-dashboard-full-access-group with the name of the group specified in your Kubernetes RoleBinding or ClusterRoleBinding object.
  2. Replace 111122223333 with your account ID.
  3. You can replace admin with any name you choose.
./eksctl create iamidentitymapping --cluster ray-dev --region=eu-west-2 \
    --arn arn:aws:iam::111122223333:user/a.user --username admin --group system:masters \