Now that the session model means no HTTP connection is held during the reservation, the dichotomy between "single reserve" and "trapezoid demo" collapses — both are "open N sessions, each held for H seconds, started I seconds apart, close." Replace --reserve/--demo/--duration/--plateau with --count/--hold/--interval. --session-cost becomes --cost. Client is now 64 lines (down from 120). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Null PyWorker
Holds Vast Serverless reservations open without forwarding any work to a model. Use it when your real workload (a queue consumer in any language) runs as a separate process on the instance and you just want to drive Vast autoscaling: one POST reserves a worker, one POST releases it.
Use case
You have a job queue on your own infrastructure (Redis, SQS, NATS, etc.) and a consumer (node, golang, python, a binary — anything) that pulls from it. You want one Vast worker per unit of in-flight work, scaling elastically from zero. The null PyWorker is the autoscaling driver; your consumer does the work.
How it works
Reservations use the framework's session API. The SDK's
endpoint.session(...) POSTs /session/create to reserve a worker;
session.close() POSTs /session/end to release it. max_sessions=1
means each worker holds exactly one reservation — the next reservation
either lands on a free worker or triggers a scale-up.
The PyWorker itself does nothing functional:
- One trivial
/pingroute to satisfy the framework's benchmark requirement (itsmax_perfis pinned to 100). - An internal
/releaseendpoint on127.0.0.1:18999for the local consumer to end the session without needingsession_auth.
Endpoint parameters
Tested working configuration:
| Parameter | Value | Why |
|---|---|---|
target_util |
1.0 |
One session = one worker. Default 0.9 rounds up to an extra worker. |
min_load |
0 |
Scale-to-zero floor. |
max_queue_time |
1 |
Stop routing to an occupied worker after ~1s of implied queue. |
target_queue_time |
0.5 |
Trigger scale-up promptly once anything queues. |
inactivity_timeout |
10 (seconds) |
Permit scale-to-zero after 10s idle. |
API
| Route | Where | Use |
|---|---|---|
POST /session/create |
endpoint, signed | Reserve a worker (endpoint.session(...)) |
POST /session/end |
endpoint, signed | Release (session.close()) |
POST /release |
127.0.0.1:18999, no auth |
Local consumer release, no session_auth needed |
Healthcheck
Default: stub on 127.0.0.1:18999/health returning 200. Set
BACKEND_HEALTH_URL=http://127.0.0.1:9090/health (absolute URL) to point
the framework at your queue consumer's health endpoint instead — if the
consumer dies, the autoscaler sees the worker as broken.
Deploying
- Point
PYWORKER_REPOat this repo (or your fork). - Set
BACKEND=nullin the template. - Run your queue consumer alongside the PyWorker. When it's done with
a unit of work:
curl -X POST http://127.0.0.1:18999/release
Client demo
# Single reservation, hold 180s
python -m workers.null.client --endpoint <NAME> --instance alpha
# Three concurrent reservations, started 30s apart, each held 360s
python -m workers.null.client --endpoint <NAME> --instance alpha --count 3 --hold 360
Flags: --count (number of concurrent sessions, default 1), --hold
(seconds each session is held, default 180), --interval (seconds
between starts when --count > 1, default 30), --cost (cost reported
at session-create, default 100 = max_perf), --instance (prod |
alpha | candidate | local).
Environment variables
BACKEND_HEALTH_URL— absolute URL the framework healthchecks. Stub is used when unset.NULL_CONTROL_PORT— internal control server port. Defaults to18999.