Reject queued /reserve immediately on busy null workers
A held reservation runs for up to MAX_RESERVATION_SECONDS (default 1h), so queueing a second /reserve behind it makes no sense — the wait would dwarf any sane timeout. Set max_queue_time=0.0 so the framework rejects 429 as soon as another reservation is in flight, and serverless routes the request to a free worker or scales a new one up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -28,10 +28,10 @@ held `/reserve` returns `200`.
|
|||||||
|
|
||||||
## How it works
|
## How it works
|
||||||
|
|
||||||
- `allow_parallel_requests=False`, so one in-flight `/reserve` fully occupies
|
- `allow_parallel_requests=False` and `max_queue_time=0.0`, so one in-flight
|
||||||
the worker. Any second request that lands on the same worker queues (or is
|
`/reserve` fully occupies the worker and any further request that lands
|
||||||
rejected with `429` after `max_queue_time`), pushing the autoscaler to
|
on it is rejected with `429` immediately — serverless will route to a
|
||||||
provision more workers.
|
free worker or scale a new one up.
|
||||||
- `lifecycle` is used instead of `model_log_file`, so there is no log to tail
|
- `lifecycle` is used instead of `model_log_file`, so there is no log to tail
|
||||||
and no model server to start. The worker reports itself ready immediately
|
and no model server to start. The worker reports itself ready immediately
|
||||||
after the (trivial) benchmark.
|
after the (trivial) benchmark.
|
||||||
@@ -85,8 +85,8 @@ Behavior:
|
|||||||
the duration cap fires (safety net for a stuck consumer).
|
the duration cap fires (safety net for a stuck consumer).
|
||||||
- Returns `499` if the external client disconnects (counted as cancelled in
|
- Returns `499` if the external client disconnects (counted as cancelled in
|
||||||
metrics — avoid this; use `/release` instead).
|
metrics — avoid this; use `/release` instead).
|
||||||
- Returns `429` if the worker is already busy and queue wait would exceed
|
- Returns `429` immediately if the worker is already holding a reservation
|
||||||
`max_queue_time` (30s by default).
|
(so serverless routes the request to a free worker instead of queueing).
|
||||||
|
|
||||||
### `POST /release` (internal port, localhost-only)
|
### `POST /release` (internal port, localhost-only)
|
||||||
|
|
||||||
|
|||||||
@@ -159,7 +159,12 @@ worker_config = WorkerConfig(
|
|||||||
HandlerConfig(
|
HandlerConfig(
|
||||||
route="/reserve",
|
route="/reserve",
|
||||||
allow_parallel_requests=False,
|
allow_parallel_requests=False,
|
||||||
max_queue_time=30.0,
|
# Reject (429) any /reserve that arrives while the worker is
|
||||||
|
# already busy. A held reservation lasts up to MAX_RESERVATION_
|
||||||
|
# SECONDS, so queueing behind it would mean hours of wait —
|
||||||
|
# better to bounce the request immediately so serverless routes
|
||||||
|
# it to a free worker (or spins up a new one).
|
||||||
|
max_queue_time=0.0,
|
||||||
remote_function=reserve_worker,
|
remote_function=reserve_worker,
|
||||||
workload_calculator=lambda _payload: 100.0,
|
workload_calculator=lambda _payload: 100.0,
|
||||||
benchmark_config=BenchmarkConfig(
|
benchmark_config=BenchmarkConfig(
|
||||||
|
|||||||
Reference in New Issue
Block a user