Calling ray.get in a loop harms parallelism

ℹ️

TLDR: Avoid calling ray.get() in a loop since it’s a blocking call; use ray.get() only for the final result.

With Ray, all invocations of .remote() calls are asynchronous, meaning the operation returns immediately with a promise/future object Reference ID. This is key to achieving massive parallelism, for it allows a devloper to launch many remote tasks, each returning a remote future object ID. Whenever needed, this object ID is fetched with ray.get(). Because ray.get() is a blocking call, where and how often you use can affect the performance of your Ray application.

Ray get loop

Say we had the following task:

@ray.remote
def do_some_work(x):
    # Assume doing some computation
    time.sleep(0.5)
    return math.exp(x)

Bad Usage

We use ray.get inside a list comprehension loop, hence it blocks on each call of .remote(), delaying until the task is finished and the value is materialized and fetched from the Ray object store.

%%time
results = [ray.get(do_some_work.remote(x)) for x in range(25)]

Returns:

CPU times: total: 31.2 ms
Wall time: 12.6 s

Good Usage

We delay ray.get() after all the tasks have been invoked and their references have been returned. That is, we don't block on each call but instead do outside the comprehension loop.

%%time
results = ray.get([do_some_work.remote(x) for x in range(25)])

Which returns:

CPU times: total: 0 ns
Wall time: 1.01 s

Limiting number of pending tasks Over-parallelizing