Launching Ray Clusters on AWS
To start an AWS Ray cluster, you should use the Ray cluster launcher with the AWS Python SDK.
Using the Cluster Management CLI
Install Ray cluster launcher
The Ray cluster launcher is part of the ray CLI. Use the CLI to start, stop and attach to a running ray cluster using commands such as ray up, ray down and ray attach. You can use pip to install the ray CLI with cluster launcher support. Follow the Ray installation documentation for more detailed instructions.
# install ray
pip install -U ray[default]
Install and Configure AWS Python SDK (Boto3)
Next, install AWS SDK using pip install -U boto3 and configure your AWS credentials following the AWS guide (opens in a new tab). Boto3 will look in several locations when searching for credentials. The mechanism in which Boto3 looks for credentials is to search through a list of possible locations and stop as soon as it finds credentials. The order in which Boto3 searches for credentials is:
- Passing credentials as parameters in the
boto.client()
method - Passing credentials as parameters when creating a Session object
- Environment variables
- This is my preferred method
- Shared credential file (
~/.aws/credentials
) - AWS config file (
~/.aws/config
) - Assume Role provider
- Boto2 config file (
/etc/boto.cfg and ~/.boto
) - Instance metadata service on an Amazon EC2 instance that has an IAM role configured.
# install AWS Python SDK (boto3)
pip install -U boto3
And then in the .env
file:
# AWS Credentials
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_SESSION_TOKEN=...
To create, modify, or delete your own access keys, In the navigation bar in the AWS console, on the upper right, choose your user name, and then choose Security credentials. There you can create an access key. Then to retrieve a temporary credential using AWS STS:
import boto3
from decouple import config
def get_aws_sts():
"""
Get AWS STS credentials
"""
access_key = config('AWS_ACCESS_KEY_ID')
secret_access_key = config('AWS_SECRET_ACCESS_KEY')
client = boto3.client(
'sts',
aws_access_key_id=access_key,
aws_secret_access_key=secret_access_key
)
response = client.get_session_token()
expiry = response['Credentials']['Expiration']
print(f"Credentials expire at {expiry}")
return response['Credentials']
if __name__ == '__main__':
print(get_aws_sts())
Start Ray with the Ray cluster launcher
Once Boto3 is configured to manage resources in your AWS account, you should be ready to launch your cluster using the cluster launcher. The cluster config file (opens in a new tab) provided by Ray will create a small cluster with an m5.large
head node (on-demand) configured to autoscale to up to two m5.large
spot-instance workers. Test that it works by running the following commands from your local machine:
# Create or update the cluster. When the command finishes, it will print
# out the command that can be used to SSH into the cluster head node.
ray up aws/cluster.yaml --no-config-cache
# Get a remote shell on the head node.
ray attach aws/cluster.yaml
# Try running a Ray program.
python -c 'import ray; ray.init()'
exit
# Tear down the cluster.
ray down aws/cluster.yaml
Security
By default, Ray nodes in a Ray AWS cluster have full EC2 and S3 permissions (i.e. arn:aws:iam::aws:policy/AmazonEC2FullAccess
and arn:aws:iam::aws:policy/AmazonS3FullAccess
). This is a good default for trying out Ray clusters but you may want to change the permissions Ray nodes have for various reasons (e.g. to reduce the permissions for security reasons). You can do so by providing a custom IamInstanceProfile
to the related node_config
:
available_node_types:
ray.worker.default:
node_config:
...
IamInstanceProfile:
Arn: arn:aws:iam::YOUR_AWS_ACCOUNT:YOUR_INSTANCE_PROFILE
Ray Serve: Kubernetes using the KubeRay RayService
For Ray Serve, it is recommended in the docs (opens in a new tab) to deploy it in production on Kubernetes, with the recommended practice to use the RayService (opens in a new tab) controller that’s provided as part of KubeRay (opens in a new tab). This setup provides the best of both worlds: the user experience and scalable compute of Ray Serve and operational benefits of Kubernetes. This also allows you to integrate with existing applications that may be running on Kubernetes. The RayService custom resource automatically handles important production requirements such as health checking, status reporting, failure recovery, and upgrades. If you’re not running on Kubernetes, you can also run Ray Serve on a Ray cluster directly using the Serve CLI. To do this, you will need to generate a Serve config file and deploy it using the Serve CLI.
A RayService Custom Resource (CR) encapsulates a multi-node Ray Cluster and a Serve application that runs on top of it into a single Kubernetes manifest. Deploying, upgrading, and getting the status of the application can be done using standard kubectl
commands.
The Serve Config File
For apps we are building, in development, we would likely use the serve run
command to iteratively run, develop, and repeat (see the Development Workflow (opens in a new tab) for more information). When we’re ready to go to production, we will generate a structured config file that acts as the single source of truth for the application.
You can use the Serve config with the serve deploy
CLI command used to deploy on VM or embed it in a RayService custom resource in Kubernetes to deploy and update your application in production. The config is a YAML file with the following format:
proxy_location: ...
http_options:
host: ...
port: ...
request_timeout_s: ...
keep_alive_timeout_s: ...
grpc_options:
port: ...
grpc_servicer_functions: ...
applications:
- name: ...
route_prefix: ...
import_path: ...
runtime_env: ...
deployments:
- name: ...
num_replicas: ...
...
- name:
...
The file contains proxy_location
, http_options
, grpc_options
, and applications
. See details about each field in the docs (opens in a new tab)
We can also auto-generate this config file from the code. The serve build
command takes an import path to your deployment graph and it creates a config file containing all the deployments and their settings from the graph. You can tweak these settings to manage your deployments in production.
Note that the runtime_env
field will always be empty when using serve build
and must be set manually. In my case, if modin
or QuantLib
are not installed globally, you should include these two pip packages in the runtime_env
.
This config file can be generated using serve build:
serve build app.main:app -o serve_config.yaml
For me, the generated config file looks like this:
# This file was generated using the `serve build` command on Ray v2.7.0.
proxy_location: EveryNode
http_options:
host: 0.0.0.0
port: 8000
grpc_options:
port: 9000
grpc_servicer_functions: []
applications:
- name: app1
route_prefix: /
import_path: app.main:app
runtime_env: {}
deployments:
- name: PGMaster
- name: API
The generated version of this file contains an import_path
, runtime_env
, and configuration options for each deployment in the application. The application needs packages, so modify the runtime_env field of the generated config to include these two pip packages. Save this config locally in serve_config.yaml
:
# This file was generated using the `serve build` command on Ray v2.7.0.
proxy_location: EveryNode
http_options:
host: 0.0.0.0
port: 8000
grpc_options:
port: 9000
grpc_servicer_functions: []
applications:
- name: app1
route_prefix: /
import_path: app.main:app
runtime_env:
pip:
- QuantLib
- asyncpg
- numpy
- modin
deployments:
- name: PGMaster
- name: QuadraAPI
You can use serve deploy
to deploy the application to a local Ray cluster and serve status
to get the status at runtime:
# Start a local Ray cluster.
ray start --head
# Start the application.
serve deploy serve_config.yaml
And to stop the ray cluster:
# Stop the application.
serve shutdown
# Stop the local Ray cluster.
ray stop
To update the application, modify the config file and use serve deploy
again.
Deploying on Kubernetes using KubeRay
KubeRay is a powerful, open-source Kubernetes operator that simplifies the deployment and management of Ray applications on Kubernetes. Read more in the docs (opens in a new tab). It offers 3 custom resource definitions (CRDs):
- RayCluster: KubeRay fully manages the lifecycle of RayCluster, including cluster creation/deletion, autoscaling, and ensuring fault tolerance.
- RayJob: With RayJob, KubeRay automatically creates a RayCluster and submits a job when the cluster is ready. You can also configure RayJob to automatically delete the RayCluster once the job finishes.
- RayService: RayService is made up of two parts: a RayCluster and Ray Serve deployment graphs. RayService offers zero-downtime upgrades for RayCluster and high availability.
We will deploy a Ray Serve application using a RayService.
1. Create a Kubernetes cluster with Kind
First, create a Kubernetes cluster with Kind for local development:
kind create cluster --image=kindest/node:v1.23.0
2. Install the KubeRay operator
Install the KubeRay operator (opens in a new tab) via Helm repository.
$ helm repo add kuberay https://ray-project.github.io/kuberay-helm/
$ helm repo update
# Install both CRDs and KubeRay operator v1.0.0-rc.0.
$ helm install kuberay-operator kuberay/kuberay-operator --version 1.0.0-rc.0
# Confirm that the operator is running in the namespace `default`.
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
kuberay-operator-68cc555c9-qc7cf 1/1 Running 0 22s
3. Set up a RayService Custom Resource (CR)
A RayService manages two components:
- RayCluster: Manages resources in a Kubernetes cluster.
- Ray Serve Applications: Manages users’ applications.
The Ray service is used to provide:
- Kubernetes-native support for Ray clusters and Ray Serve applications: After using a Kubernetes config to define a Ray cluster and its Ray Serve applications, you can use
kubectl
to create the cluster and its applications. - In-place updates for Ray Serve applications: Users can update the Ray Serve config in the RayService CR config and use
kubectl apply
to update the applications. - Zero downtime upgrades for Ray clusters: Users can update the Ray cluster config in the RayService CR config and use
kubectl apply
to update the cluster. RayService temporarily creates a pending cluster and waits for it to be ready, then switches traffic to the new cluster and terminates the old one. - Services HA: RayService monitors the Ray cluster and Serve deployments' health statuses. If RayService detects an unhealthy status for a period of time, RayService tries to create a new Ray cluster and switch traffic to the new cluster when it's ready.
So, to manage the Ray Serve application, create and update a RayService CR. For a demo Service CR from the QuickStart, run this:
# Step 3.1: Download `ray_v1alpha1_rayservice.yaml`
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.0.0-rc.0/ray-operator/config/samples/ray_v1alpha1_rayservice.yaml
# Step 3.2: Create a RayService
kubectl apply -f ray_v1alpha1_rayservice.yaml
kubectl apply
creates the underlying Ray cluster, consisting of a head and worker node pod (see Ray Clusters Key Concepts (opens in a new tab) for more details on Ray clusters), as well as the service that can be used to query our application.
For custom apps, we will need to generate a serve config file and embed it in a RayService CR for Kubernetes to deploy and update the application in production. To understand this yaml file better, see the docs (opens in a new tab).
4. Check RayService Status
When the RayService is created, the KubeRay controller first creates a Ray cluster using the provided configuration. Then, once the cluster is running, it deploys the Serve application to the cluster using the REST API. The controller also creates a Kubernetes Service that can be used to route traffic to the Serve application.
# Step 4.1: List all RayService custom resources in the `default` namespace.
$ kubectl get rayservice
NAME AGE
rayservice-sample 3m58s
# Step 4.2: List all RayCluster custom resources in the `default` namespace.
$ kubectl get raycluster
NAME DESIRED WORKERS AVAILABLE WORKERS STATUS AGE
rayservice-sample-raycluster-9gr8f 1 1 ready 4m44s
# Step 4.3: List all Ray Pods in the `default` namespace.
$ kubectl get pods -l=ray.io/is-ray-node=yes
NAME READY STATUS RESTARTS AGE
ervice-sample-raycluster-9gr8f-worker-small-group-tnl77 1/1 Running 0 4m59s
rayservice-sample-raycluster-9gr8f-head-5d65p 1/1 Running 0 4m59s
# Step 4.4: List services in the `default` namespace.
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kuberay-operator ClusterIP 10.96.177.172 <none> 8080/TCP 7m33s
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 8m20s
rayservice-sample-head-svc ClusterIP 10.96.67.190 <none> 10001/TCP,8265/TCP,52365/TCP,6379/TCP,8080/TCP,8000/TCP 91s
rayservice-sample-raycluster-9gr8f-head-svc ClusterIP 10.96.125.149 <none> 10001/TCP,8265/TCP,52365/TCP,6379/TCP,8080/TCP,8000/TCP 5m29s
rayservice-sample-serve-svc ClusterIP 10.96.121.47 <none> 8000/TCP 91s
When the Ray Serve applications are healthy and ready, KubeRay creates a head service and a Ray Serve service for the RayService custom resource. For example, rayservice-sample-head-svc
and rayservice-sample-serve-svc
in Step 4.4. Note that the rayservice-sample-serve-svc
is the one that can be used to send queries to the Serve application – this will be used in the next section.
Users can access the head Pod through both the head service managed by RayService (that is, rayservice-sample-head-svc
) and the head service managed by RayCluster (that is, rayservice-sample-raycluster-6mj28-head-svc
). However, during a zero downtime upgrade, a new RayCluster is created, and a new head service is created for the new RayCluster. If you don't userayservice-sample-head-svc
, you need to update the ingress configuration to point to the new head service. However, if you use rayservice-sample-head-svc
, KubeRay automatically updates the selector to point to the new head Pod, eliminating the need to update the ingress configuration.
5. Querying the Application
Once the RayService is running, we can query it over HTTP using the service created by the KubeRay controller. This service can be queried directly from inside the cluster, but to access it from your laptop you’ll need to configure a Kubernetes ingress or use port forwarding as below:
kubectl port-forward service/rayservice-sample-serve-svc 8000
Forward the dashboard port to localhost aswell, and check the Serve page in the Ray dashboard at http://localhost:8265/#/serve
kubectl port-forward svc/rayservice-sample-head-svc --address 0.0.0.0 8265:8265
For example, you can call the Fruit demo app with:
curl -X POST -H 'Content-Type: application/json' http://localhost:8000/fruit/ -d '["MANGO", 2]'
# Output: 6
The default ports and their definitions are:
Port | Definition |
---|---|
6379 | Ray GCS |
8265 | Ray Dashboard |
10001 | Ray Client |
8000 | Ray Serve |
52365 | Ray Dashboard Agent |
6. Getting the status of the application
As the RayService is running, the KubeRay controller continually monitors it and writes relevant status updates to the CR. You can view the status of the application using kubectl describe. This includes the status of the cluster, events such as health check failures or restarts, and the application-level statuses reported by serve status:
$ kubectl get rayservices
NAME AGE
rayservice-sample 7m59s
$ kubectl describe rayservice rayservice-sample
Name: rayservice-sample
Namespace: default
Labels: <none>
Annotations: <none>
API Version: ray.io/v1alpha1
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Running 3m45s (x13 over 4m6s) rayservice-controller The Serve applicaton is now running and healthy.
...
7. Updating the application
To update the RayService, modify the manifest and apply it use kubectl apply
. There are two types of updates that can occur:
- Application-level updates: when only the Serve config options are changed, the update is applied in-place on the same Ray cluster. This enables lightweight updates such as scaling a deployment up or down or modifying autoscaling parameters.
- Cluster-level updates: when the RayCluster config options are changed, such as updating the container image for the cluster, it may result in a cluster-level update. In this case, a new cluster is started, and the application is deployed to it. Once the new cluster is ready, the Kubernetes service is updated to point to the new cluster and the previous cluster is terminated. There should not be any downtime for the application, but note that this requires the Kubernetes cluster to be large enough to schedule both Ray clusters.
In the Text ML example, change the language of the Translator in the Serve config to German:
- name: Translator
num_replicas: 1
user_config:
language: german
Now to update the application we apply the modified manifest:
kubectl apply -f ray-service.text-ml.yaml
kubectl describe rayservice rayservice-sample
The process of updating the RayCluster config is the same as updating the Serve config. For example, we can update the number of worker nodes to 2 in the manifest:
workerGroupSpecs:
# the number of pods in the worker group.
- replicas: 2
8. Clean up
Clean up the Kubernetes cluster
# Delete the RayService.
kubectl delete -f ray_v1alpha1_rayservice.yaml
# Uninstall the KubeRay operator.
helm uninstall kuberay-operator
# Delete the curl Pod.
kubectl delete pod curl
Deploy on VM
You can deploy your Serve application to production on a Ray cluster using the Ray Serve CLI. serve deploy
takes in a config file path and it deploys that file to a Ray cluster over HTTP. This could either be a local, single-node cluster or a remote, multi-node cluster started with the Ray Cluster Launcher. See more in the Docs (opens in a new tab)
Resources:
- Ray Docs: Launching Ray Clusters on AWS (opens in a new tab)
- Scaling AI and Machine Learning Workloads with Ray on AWS (opens in a new tab)
- Deploying Ray Cluster for AI/ML workloads on a Kubernetes Cluster (opens in a new tab)
- Cluster Management CLI (opens in a new tab)
- Ray Serve Production Guide (opens in a new tab)
- Deploy on Kubernetes (opens in a new tab)
- RayService Quickstart (opens in a new tab)