Cloud & MLOps ☁️
Ray
Ray use cases

Ray use cases

Now that we have a sense of what Ray is in theory, it's important to discuss what makes Ray so useful in practice. On this page, we will encounter the ways that individuals, organizations, and companies leverage Ray to build their AI applications.

First, let's explore how Ray scales common ML workloads. Then, we will look at some advanced implementations.

Scaling common ML workloads

Parallel training of many models

When any given model you want to train can fit on a single GPU, then Ray can assign each training run to a separate Ray Task. In this way, all available workers are utilized to run independent remote training rather than one worker running jobs sequentially.

Data parallelism pattern for distributed training on large datasets

Distributed training of large models

In contrast to training many models, model parallelism partitions a large model across many machines for training. Ray Train has built-in abstractions for distributing shards of models and running training in parallel.

Model parallelism pattern for distributed large model training

Managing parallel hyperparameter tuning experiments

Running multiple hyperparameter tuning experiments is a pattern apt for distributed computing because each experiment is independent of one another. Ray Tune handles the hard bit of distributing hyperparameter optimization and makes available key features such as checkpointing the best result, optimizing scheduling, specifying search patterns, and more.

Distributed tuning with distributed training per trial

Reinforcement learning

Ray RLlib offers support for production-level, distributed reinforcement learning workloads while maintaining unified and simple APIs for a large variety of industry applications. Below, you can see decentralized distributed proximal polixy optimiation (DD-PPO) architecture, supported by Ray RLLib, where sampling and training are done on worker GPUs

![Decentralized distributed proximal polixy optimiation (DD-PPO) architecture, supported by Ray RLLib, where sampling and training are done on worker GPUs](/img/ray/rllib_use_case.png" width="70%" loadi)

Batch inference on CPUs and GPUs

Performing inference on incoming batches of data can be parallelized by exporting the architecture and weights of a trained model to the shared object store. Then, using these model replicas, Ray scales predictions on batches across workers using Ray AIR's BatchPredictor for batch inference:

Using Ray AIR's BatchPredictor for batch inference

Multi-model composition for model serving

Ray Serve (opens in a new tab) supports complex model deployment patterns (opens in a new tab) requiring the orchestration of multiple Ray actors, where different actors provide inference for different models. Serve handles both batch and online inference and can scale to thousands of models in production.

Deployment patterns with Ray Serve

ML platform

Merlin (opens in a new tab) is Shopify's ML platform built on Ray. It enables fast-iteration and scaling of distributed applications (opens in a new tab) such as product categorization and recommendations.

Merlin architecture built on Ray

Spotify uses Ray for advanced applications (opens in a new tab) that include personalizing content recommendations for home podcasts and personalizing Spotify Radio track sequencing.

How Ray ecosystem empowers ML scientists and engineers at Spotify

Implementing advanced ML workloads

Alpa - training very large models with Ray

Alpa (opens in a new tab) is a Ray-native library that automatically partitions, schedules, and executes the training and serving computation of very large deep learning models (opens in a new tab) on hundreds of GPUs.

Parallelization plans for a computational graph from Alpa

Above you can see parallelization plans for a computational graph from Alpa. A, B, C, and D are operators. Each color represents a different device (i.e. GPU) executing a partition or the full operator leveraging Ray actors

Exoshuffle - large scale data shuffling

In Ray 2.0, Exoshuffle (opens in a new tab) is integrated with the Ray Data to provide an application level shuffle system that outperforms Spark (opens in a new tab) and achieves 82% of theoretical performance on a 100TB sort on 100 nodes.

Shuffle on Ray architecture diagram

Hamilton - feature engineering

Hamilton (opens in a new tab) is an open source dataflow micro-framework that manages feature engineering for time series forecasting. Developed at StitchFix (opens in a new tab), this library provides scalable parallelism, where each Hamilton function is distributed and data is limited by machine memory.

Hamilton architecture on Ray clusters

Riot Games - reinforcement learning

Riot Games uses Ray to build bots (opens in a new tab) that play new battles at various skill levels. These reinforcement learning agents provide additional signals to their designers to deliver the best experiences for players.

Riot reinforcement learning workflow on Ray including data transformation and training