Three concurrent /reserve calls 30s apart, then cancel the first to show
the early-release path. The remaining two run until their duration cap.
Useful for watching scale-up/scale-down behaviour in the autoscaler
dashboard.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A PyWorker that does not forward to any model server. POST /reserve holds
the worker busy until the client disconnects (or the duration cap elapses),
so users with their own job queue can drive Vast autoscaling without
exposing inbound model traffic on the instance.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>