Pass over all three files to drop verbose expository commentary that duplicated either the code or the README. Net: -284 lines. README now reads top-to-bottom in roughly the order someone would need the info: use case → how it works → endpoint params → API → healthcheck → deploy → demo. Endpoint params table uses the values actually tested on alpha (min_load=0, target_util=1, max_queue_time=1, target_queue_time=0.5, inactivity_timeout=10). Dropped the "known autoscaler quirk" section now that alpha addresses it; kept the --session-cost flag as a debugging knob. worker.py and client.py keep the same behavior but trim long block comments and multi-line docstrings the code didn't need. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Null PyWorker
Holds Vast Serverless reservations open without forwarding any work to a model. Use it when your real workload (a queue consumer in any language) runs as a separate process on the instance and you just want to drive Vast autoscaling: one POST reserves a worker, one POST releases it.
Use case
You have a job queue on your own infrastructure (Redis, SQS, NATS, etc.) and a consumer (node, golang, python, a binary — anything) that pulls from it. You want one Vast worker per unit of in-flight work, scaling elastically from zero. The null PyWorker is the autoscaling driver; your consumer does the work.
How it works
Reservations use the framework's session API. The SDK's
endpoint.session(...) POSTs /session/create to reserve a worker;
session.close() POSTs /session/end to release it. max_sessions=1
means each worker holds exactly one reservation — the next reservation
either lands on a free worker or triggers a scale-up.
The PyWorker itself does nothing functional:
- One trivial
/pingroute to satisfy the framework's benchmark requirement (itsmax_perfis pinned to 100). - An internal
/releaseendpoint on127.0.0.1:18999for the local consumer to end the session without needingsession_auth.
Endpoint parameters
Tested working configuration:
| Parameter | Value | Why |
|---|---|---|
target_util |
1.0 |
One session = one worker. Default 0.9 rounds up to an extra worker. |
min_load |
0 |
Scale-to-zero floor. |
max_queue_time |
1 |
Stop routing to an occupied worker after ~1s of implied queue. |
target_queue_time |
0.5 |
Trigger scale-up promptly once anything queues. |
inactivity_timeout |
10 (seconds) |
Permit scale-to-zero after 10s idle. |
API
| Route | Where | Use |
|---|---|---|
POST /session/create |
endpoint, signed | Reserve a worker (endpoint.session(...)) |
POST /session/end |
endpoint, signed | Release (session.close()) |
POST /release |
127.0.0.1:18999, no auth |
Local consumer release, no session_auth needed |
Healthcheck
Default: stub on 127.0.0.1:18999/health returning 200. Set
BACKEND_HEALTH_URL=http://127.0.0.1:9090/health (absolute URL) to point
the framework at your queue consumer's health endpoint instead — if the
consumer dies, the autoscaler sees the worker as broken.
Deploying
- Point
PYWORKER_REPOat this repo (or your fork). - Set
BACKEND=nullin the template. - Run your queue consumer alongside the PyWorker. When it's done with
a unit of work:
curl -X POST http://127.0.0.1:18999/release
Client demo
# Single reservation
python -m workers.null.client --endpoint <NAME> --instance alpha
# Staggered three-session trapezoid
python -m workers.null.client --endpoint <NAME> --instance alpha --demo
Flags: --duration (single), --interval and --plateau (demo
timing), --session-cost (overrides the cost reported at session
create; default 100 = max_perf), --instance (prod | alpha |
candidate | local).
Environment variables
BACKEND_HEALTH_URL— absolute URL the framework healthchecks. Stub is used when unset.NULL_CONTROL_PORT— internal control server port. Defaults to18999.