pyworker

Files

T

Rob Ballantyne 01eff874d8 Correct queue-time guidance for null pyworker endpoints

Earlier note claimed max_queue_time / target_queue_time were no-ops
because the worker's internal wait_time property filters sessions out.
That filter only affects per-worker rejection on a given handler — the
autoscaler doesn't see the property and computes its own queue-time
estimate from cur_load / max_perf, which *does* include sessions.

With defaults around 30s, an occupied null worker (cur_load=100,
max_perf=100, implied queue=1s) still looks "available" to the
autoscaler, so a third reservation gets queued on an existing worker
via repeated 429-retries instead of triggering scale-up.

Fix: set max_queue_time = 0 and target_queue_time = 0 on the endpoint.
Any in-flight load marks the worker "full" for routing, and any
observed queue time triggers immediate scale-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-12 11:14:20 +01:00

ace

Use PyWorker SDK (#67 )

2025-12-15 19:33:03 -08:00

comfyui-json

Use PyWorker SDK (#67 )

2025-12-15 19:33:03 -08:00

null

Correct queue-time guidance for null pyworker endpoints

2026-05-12 11:14:20 +01:00

openai

Lowered concurrency of vLLM and TGI benchmarks

2025-12-17 11:55:33 -08:00

tgi

Increase TGI benchmark tokens to 500

2026-04-30 14:04:39 -07:00

wan

Use PyWorker SDK (#67 )

2025-12-15 19:33:03 -08:00