Files

89 lines
3.3 KiB
Markdown
Raw Permalink Normal View History

# Null PyWorker
2026-05-12 11:50:03 +01:00
Holds Vast Serverless reservations open without forwarding any work to a
model. Use it when your real workload (a queue consumer in any language)
runs as a separate process on the instance and you just want to drive
Vast autoscaling: **one POST reserves a worker, one POST releases it.**
2026-05-12 11:50:03 +01:00
## Use case
2026-05-12 11:50:03 +01:00
You have a job queue on your own infrastructure (Redis, SQS, NATS, etc.)
and a consumer (node, golang, python, a binary — anything) that pulls
from it. You want one Vast worker per unit of in-flight work, scaling
elastically from zero. The null PyWorker is the autoscaling driver; your
consumer does the work.
## How it works
2026-05-12 11:50:03 +01:00
Reservations use the framework's session API. The SDK's
`endpoint.session(...)` POSTs `/session/create` to reserve a worker;
`session.close()` POSTs `/session/end` to release it. `max_sessions=1`
means each worker holds exactly one reservation — the next reservation
either lands on a free worker or triggers a scale-up.
2026-05-12 11:50:03 +01:00
The PyWorker itself does nothing functional:
2026-05-12 11:50:03 +01:00
- One trivial `/ping` route to satisfy the framework's benchmark
requirement (its `max_perf` is pinned to 100).
- An internal `/release` endpoint on `127.0.0.1:18999` for the local
consumer to end the session without needing `session_auth`.
2026-05-12 11:50:03 +01:00
## Endpoint parameters
2026-05-12 11:50:03 +01:00
Tested working configuration:
2026-05-12 11:50:03 +01:00
| Parameter | Value | Why |
|---|---|---|
| `target_util` | `1.0` | One session = one worker. Default `0.9` rounds up to an extra worker. |
| `min_load` | `0` | Scale-to-zero floor. |
| `max_queue_time` | `1` | Stop routing to an occupied worker after ~1s of implied queue. |
| `target_queue_time` | `0.5` | Trigger scale-up promptly once anything queues. |
| `inactivity_timeout` | `10` (seconds) | Permit scale-to-zero after 10s idle. |
2026-05-12 11:50:03 +01:00
## API
2026-05-12 11:50:03 +01:00
| Route | Where | Use |
|---|---|---|
| `POST /session/create` | endpoint, signed | Reserve a worker (`endpoint.session(...)`) |
| `POST /session/end` | endpoint, signed | Release (`session.close()`) |
| `POST /release` | `127.0.0.1:18999`, no auth | Local consumer release, no `session_auth` needed |
2026-05-12 11:50:03 +01:00
## Healthcheck
2026-05-12 11:50:03 +01:00
Default: stub on `127.0.0.1:18999/health` returning `200`. Set
`BACKEND_HEALTH_URL=http://127.0.0.1:9090/health` (absolute URL) to point
the framework at your queue consumer's health endpoint instead — if the
consumer dies, the autoscaler sees the worker as broken.
2026-05-12 11:50:03 +01:00
## Deploying
2026-05-12 11:50:03 +01:00
1. Point `PYWORKER_REPO` at this repo (or your fork).
2. Set `BACKEND=null` in the template.
3. Run your queue consumer alongside the PyWorker. When it's done with
a unit of work:
```bash
curl -X POST http://127.0.0.1:18999/release
```
2026-05-12 11:50:03 +01:00
## Client demo
```bash
# Single reservation, hold 180s
2026-05-12 11:50:03 +01:00
python -m workers.null.client --endpoint <NAME> --instance alpha
# Three concurrent reservations, started 30s apart, each held 360s
python -m workers.null.client --endpoint <NAME> --instance alpha --count 3 --hold 360
```
Flags: `--count` (number of concurrent sessions, default 1), `--hold`
(seconds each session is held, default 180), `--interval` (seconds
between starts when `--count > 1`, default 30), `--cost` (cost reported
at session-create, default 100 = `max_perf`), `--instance` (`prod` |
`alpha` | `candidate` | `local`).
2026-05-12 11:50:03 +01:00
## Environment variables
2026-05-12 11:50:03 +01:00
- `BACKEND_HEALTH_URL` — absolute URL the framework healthchecks. Stub
is used when unset.
- `NULL_CONTROL_PORT` — internal control server port. Defaults to `18999`.