Reject queued /reserve immediately on busy null workers

A held reservation runs for up to MAX_RESERVATION_SECONDS (default 1h), so
queueing a second /reserve behind it makes no sense — the wait would dwarf
any sane timeout. Set max_queue_time=0.0 so the framework rejects 429 as
soon as another reservation is in flight, and serverless routes the request
to a free worker or scales a new one up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Rob Ballantyne
2026-05-11 17:05:02 +01:00
parent 3668d948be
commit ed0db198c3
2 changed files with 12 additions and 7 deletions
+6 -6
View File
@@ -28,10 +28,10 @@ held `/reserve` returns `200`.
## How it works ## How it works
- `allow_parallel_requests=False`, so one in-flight `/reserve` fully occupies - `allow_parallel_requests=False` and `max_queue_time=0.0`, so one in-flight
the worker. Any second request that lands on the same worker queues (or is `/reserve` fully occupies the worker and any further request that lands
rejected with `429` after `max_queue_time`), pushing the autoscaler to on it is rejected with `429` immediately — serverless will route to a
provision more workers. free worker or scale a new one up.
- `lifecycle` is used instead of `model_log_file`, so there is no log to tail - `lifecycle` is used instead of `model_log_file`, so there is no log to tail
and no model server to start. The worker reports itself ready immediately and no model server to start. The worker reports itself ready immediately
after the (trivial) benchmark. after the (trivial) benchmark.
@@ -85,8 +85,8 @@ Behavior:
the duration cap fires (safety net for a stuck consumer). the duration cap fires (safety net for a stuck consumer).
- Returns `499` if the external client disconnects (counted as cancelled in - Returns `499` if the external client disconnects (counted as cancelled in
metrics — avoid this; use `/release` instead). metrics — avoid this; use `/release` instead).
- Returns `429` if the worker is already busy and queue wait would exceed - Returns `429` immediately if the worker is already holding a reservation
`max_queue_time` (30s by default). (so serverless routes the request to a free worker instead of queueing).
### `POST /release` (internal port, localhost-only) ### `POST /release` (internal port, localhost-only)
+6 -1
View File
@@ -159,7 +159,12 @@ worker_config = WorkerConfig(
HandlerConfig( HandlerConfig(
route="/reserve", route="/reserve",
allow_parallel_requests=False, allow_parallel_requests=False,
max_queue_time=30.0, # Reject (429) any /reserve that arrives while the worker is
# already busy. A held reservation lasts up to MAX_RESERVATION_
# SECONDS, so queueing behind it would mean hours of wait —
# better to bounce the request immediately so serverless routes
# it to a free worker (or spins up a new one).
max_queue_time=0.0,
remote_function=reserve_worker, remote_function=reserve_worker,
workload_calculator=lambda _payload: 100.0, workload_calculator=lambda _payload: 100.0,
benchmark_config=BenchmarkConfig( benchmark_config=BenchmarkConfig(