Calling ray.get in a loop harms parallelism
TLDR: Avoid calling ray.get()
in a loop since it’s a blocking call; use ray.get()
only for the final result.
With Ray, all invocations of .remote()
calls are asynchronous, meaning the operation returns immediately with a promise/future object Reference ID. This is key to achieving massive parallelism, for it allows a devloper to launch many remote tasks, each returning a remote future object ID. Whenever needed, this object ID is fetched with ray.get()
. Because ray.get()
is a blocking call, where and how often you use can affect the performance of your Ray application.
Say we had the following task:
@ray.remote
def do_some_work(x):
# Assume doing some computation
time.sleep(0.5)
return math.exp(x)
Bad Usage
We use ray.get
inside a list comprehension loop, hence it blocks on each call of .remote()
, delaying until the task is finished and the value is materialized and fetched from the Ray object store.
%%time
results = [ray.get(do_some_work.remote(x)) for x in range(25)]
Returns:
CPU times: total: 31.2 ms
Wall time: 12.6 s
Good Usage
We delay ray.get()
after all the tasks have been invoked and their references have been returned. That is, we don't block on each call but instead do outside the comprehension loop.
%%time
results = ray.get([do_some_work.remote(x) for x in range(25)])
Which returns:
CPU times: total: 0 ns
Wall time: 1.01 s