workers/null/README.md

# Null PyWorker

A PyWorker that does **nothing** — it does not forward requests to any model
server. Reservations are modelled as framework **sessions**: a request
comes in and you get a worker; release and it scales back down.

## When to use it

Use this worker when you want to drive Vast Serverless autoscaling but you do
**not** want inbound requests to reach a model on the instance. Typical setup:

- You already have a job queue on your own infrastructure (Redis, SQS, NATS,
  etc.).
- A separate worker process on the Vast instance pulls work from that queue
  directly. The Vast PyWorker is not involved in the request/response path.
  Your consumer can be any language — node, golang, python, a binary —
  this PyWorker is implementation-agnostic.
- You want one Vast worker per active queue consumer, and you want the
  Serverless autoscaler to spin instances up and down based on demand on
  *your* side.

## How it works

- Reservations use the framework's **session** model. The SDK exposes
  `endpoint.session(cost, lifetime)` which POSTs to `/session/create` (a
  built-in framework route) and returns a `Session` object usable as
  `async with`. Closing the context (or calling `await session.close()`)
  POSTs to `/session/end` — counted as a normal success in metrics.
- `max_sessions=1` on the worker side means a second `/session/create`
  against an already-occupied worker returns `429`. Serverless routes
  that request to a free worker or scales a new one up.
- Sessions are **excluded from queue-wait math** (the framework filters
  `if not request.is_session`), so an occupied worker doesn't look like
  it has a request queue piling up. The autoscaler treats a session as
  occupancy, not as work-in-progress.
- `lifecycle` is used instead of `model_log_file`, so there is no log to
  tail and no model server to start. The worker reports itself ready
  immediately after a trivial benchmark.

## Healthchecking

The framework periodically GETs a healthcheck URL after startup; if it ever
fails after the first success, the worker is marked errored and the
autoscaler can decommission it. Two modes:

- **Stub (default)** — the internal control server also answers
  `GET /health` with `200`. Just enough to satisfy the framework while
  you wire up real consumers.
- **Point at your queue consumer (recommended)** — set
  `BACKEND_HEALTH_URL=http://127.0.0.1:9090/health` (absolute URL) and
  the pyworker will healthcheck *your* consumer instead. If the consumer
  process crashes, the autoscaler will see the worker as broken.

## API

### Reservation: `POST /session/create`  (external, signed)

Not implemented here — the framework provides this route automatically on
every PyWorker. Use the SDK:

```python
from vastai import Serverless

async with Serverless() as client:
    endpoint = await client.get_endpoint(name="my-null-endpoint")
    async with endpoint.session(cost=100, lifetime=600) as s:
        # Worker is now reserved. Your queue dispatcher does whatever it
        # needs to do (typically: enqueue a job that mentions s.session_id).
        ...
    # `async with` exit posts to /session/end → 200 success in metrics
```

Or raw HTTP (the SDK takes care of autoscaler signing for you, but the
shape of the request is documented for non-Python clients):

```
POST /session/create
{
  "auth_data": { /* signed by autoscaler */ },
  "payload": {
    "lifetime": 600,
    "on_close_route": "https://your.callback/notify",
    "on_close_payload": {"job_id": "..."}
  }
}
```

### Release from a local consumer: `POST /release`  (internal, localhost-only)

Closes the active session, regardless of who created it. No body, no
auth. Use this when the queue consumer doesn't have (and shouldn't need)
the session's `session_auth`:

```bash
curl -X POST http://127.0.0.1:18999/release
```

Responses:

- `200 {"released": true, "session_ids": ["..."]}` — closed; the held
  client-side `/session/create` completes and counts as a success.
- `200 {"released": false, "reason": "no active session"}` — nothing
  active, no-op.

For setups where the dispatcher can hand the consumer `session_auth`
(e.g. as part of the queue payload), the consumer can instead POST
`/session/end` on the framework's HTTP-only port
(`$WORKER_HTTP_PORT`, default `WORKER_PORT+1`) — the standard, fully
authenticated release path.

## Environment variables

- `BACKEND_HEALTH_URL` — absolute URL the framework should healthcheck
  (e.g. `http://127.0.0.1:9090/health`). When set, the stub `/health`
  route is not registered on the internal server.
- `NULL_CONTROL_PORT` — port for the internal control server (hosts
  `/release` and optionally `/health`). Defaults to `18999`.

## Deploying on Vast Serverless

1. Create a Serverless endpoint and point `PYWORKER_REPO` at this
   repository (or your fork).
2. Set `BACKEND=null` in the template so `start_server.sh` runs
   `workers.null.worker`.
3. There is no model server to configure; you can omit model-related env
   vars entirely.
4. Run your own queue-consumer process on the instance alongside the
   PyWorker. When it finishes its work:
   ```bash
   curl -X POST http://127.0.0.1:18999/release
   ```

### Endpoint scaling parameters

The null worker reports `max_perf = 100` and each reservation is a
session of `cost = 100`. The intended model is **one session = one
worker**, scaling elastically from zero up to as many concurrent
sessions as you ask for.

- **`target_util = 1.0`** — required. The default of `0.9` reserves
  ~11% spare capacity, which for a unit-occupancy worker rounds up to a
  whole extra worker (e.g. `min_load = 100` becomes `100 / 0.9 = 111.1`
  → 2 active workers instead of 1). With `target_util = 1.0` the math
  is clean: `min_load = 100 * N` keeps exactly `N` workers active.
- **`min_load = 0`** — required for scale-to-zero. With `min_load = 0`
  and a positive `inactivity_timeout`, the endpoint can scale down to
  zero active workers when no sessions exist.
- **`max_workers`** — cap on total reservations the endpoint can ever
  serve concurrently.
- **`inactivity_timeout`** — positive value enables scale-to-zero
  after the configured number of seconds of no active sessions. Use
  alongside `cold_workers = 0` to also drop the inactive pool.
- **`max_queue_time = 0`** and **`target_queue_time = 0`** —
  recommended. The autoscaler computes per-worker queue-time as
  `cur_load / max_perf` and sessions *are* in `cur_load`. With the
  defaults (~30s), an occupied null worker (`cur_load = 100`,
  `max_perf = 100`, implied queue = 1s) looks "available" for routing,
  so a third reservation gets repeatedly 429'd and never triggers
  scale-up. Zeroing both knobs tells the autoscaler "don't estimate
  when this worker will free up; route to a free one or make a new
  one."

#### Known autoscaler quirk

In current Vast Serverless, scale-up reliably fires for the 1→2
worker transition (the first 429 from an occupied worker activates a
cold one), but **the 2→3 transition often fails to fire** — the
third reservation 429s on both occupied workers and sits in the
autoscaler's global queue indefinitely instead of activating a third
cold worker. Scale-to-zero also has known issues.

Fixes are pending on the Vast side. Until they land, a temporary
workaround is to over-provision by reporting `cost > max_perf` on
session creation:

```bash
python -m workers.null.client --demo --session-cost 200
```

With `cost = 200, max_perf = 100`, each occupied worker reports
`cur_load / max_perf = 2.0` — clearly over capacity, so the autoscaler
keeps one extra active worker warm per session. The next
`/session/create` lands on the warm worker directly with no queue.
**This is a band-aid, not the design.** The intended steady state
is `cost = 100` with predictable elastic scale-up.

## Client example

Single reservation (holds for 180s):

```bash
python -m workers.null.client --endpoint <ENDPOINT_NAME>
```

Staggered demo:

```bash
python -m workers.null.client --endpoint <ENDPOINT_NAME> --demo
```

Starts three sessions 30s apart (all held concurrently), holds the
3-worker plateau for 5 minutes so the autoscaler has time to actually
provision the third worker before any scale-down starts, then closes
the sessions one at a time, also 30s apart, and exits. Every session
ends cleanly via the SDK's `session.close()` — `200` successes in
metrics, no cancellations.

Tune the timing with `--interval` and `--plateau`. To exercise the
local-release path, shell into a worker and run
`curl -X POST http://127.0.0.1:18999/release`.

## Notes and caveats

- The reservation's lifetime caps how long the session can live without
  client activity. Set it comfortably longer than the work you expect to
  do, or have the client periodically POST `/ping` with `session_id` to
  extend.
- The `on_close_route` payload (passed at `/session/create`) is POSTed by
  the framework when the session ends. Useful for notifying your queue
  consumer that the reservation is closing.
- `/release` on the internal port is convenient but bypasses
  `session_auth`. If you need the standard authenticated release flow,
  pass `session_auth` to your consumer (e.g. through the queue payload)
  and have it POST to `/session/end` on the framework's HTTP port
  instead.
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00			`# Null PyWorker`

			`A PyWorker that does nothing — it does not forward requests to any model`
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`server. Reservations are modelled as framework sessions: a request`
			`comes in and you get a worker; release and it scales back down.`
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00
			`## When to use it`

			`Use this worker when you want to drive Vast Serverless autoscaling but you do`
			`not want inbound requests to reach a model on the instance. Typical setup:`

			`- You already have a job queue on your own infrastructure (Redis, SQS, NATS,`
			`etc.).`
			`- A separate worker process on the Vast instance pulls work from that queue`
			`directly. The Vast PyWorker is not involved in the request/response path.`
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`Your consumer can be any language — node, golang, python, a binary —`
			`this PyWorker is implementation-agnostic.`
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00			`- You want one Vast worker per active queue consumer, and you want the`
			`Serverless autoscaler to spin instances up and down based on demand on`
			`your side.`

			`## How it works`

Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`- Reservations use the framework's session model. The SDK exposes`
			`endpoint.session(cost, lifetime)` which POSTs to `/session/create` (a
			built-in framework route) and returns a `Session` object usable as
			`async with`. Closing the context (or calling `await session.close()`)
			POSTs to `/session/end` — counted as a normal success in metrics.
			- `max_sessions=1` on the worker side means a second `/session/create`
			against an already-occupied worker returns `429`. Serverless routes
			`that request to a free worker or scales a new one up.`
			`- Sessions are excluded from queue-wait math (the framework filters`
			`if not request.is_session`), so an occupied worker doesn't look like
			`it has a request queue piling up. The autoscaler treats a session as`
			`occupancy, not as work-in-progress.`
			- `lifecycle` is used instead of `model_log_file`, so there is no log to
			`tail and no model server to start. The worker reports itself ready`
			`immediately after a trivial benchmark.`
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00
Wire null pyworker healthcheck to a stub (and optional user URL) 2026-05-11 16:53:26 +01:00			`## Healthchecking`

			`The framework periodically GETs a healthcheck URL after startup; if it ever`
			`fails after the first success, the worker is marked errored and the`
Add /release control endpoint to null pyworker 2026-05-11 16:59:46 +01:00			`autoscaler can decommission it. Two modes:`
Wire null pyworker healthcheck to a stub (and optional user URL) 2026-05-11 16:53:26 +01:00
Add /release control endpoint to null pyworker 2026-05-11 16:59:46 +01:00			`- Stub (default) — the internal control server also answers`
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`GET /health` with `200`. Just enough to satisfy the framework while
			`you wire up real consumers.`
Wire null pyworker healthcheck to a stub (and optional user URL) 2026-05-11 16:53:26 +01:00			`- Point at your queue consumer (recommended) — set`
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`BACKEND_HEALTH_URL=http://127.0.0.1:9090/health` (absolute URL) and
			`the pyworker will healthcheck your consumer instead. If the consumer`
Wire null pyworker healthcheck to a stub (and optional user URL) 2026-05-11 16:53:26 +01:00			`process crashes, the autoscaler will see the worker as broken.`

Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00			`## API`

Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			### Reservation: `POST /session/create` (external, signed)
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`Not implemented here — the framework provides this route automatically on`
			`every PyWorker. Use the SDK:`
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			```python
			`from vastai import Serverless`
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`async with Serverless() as client:`
			`endpoint = await client.get_endpoint(name="my-null-endpoint")`
			`async with endpoint.session(cost=100, lifetime=600) as s:`
			`# Worker is now reserved. Your queue dispatcher does whatever it`
			`# needs to do (typically: enqueue a job that mentions s.session_id).`
			`...`
			# `async with` exit posts to /session/end → 200 success in metrics
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00			```

Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`Or raw HTTP (the SDK takes care of autoscaler signing for you, but the`
			`shape of the request is documented for non-Python clients):`
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			```
			`POST /session/create`
			`{`
			`"auth_data": { /* signed by autoscaler */ },`
			`"payload": {`
			`"lifetime": 600,`
			`"on_close_route": "https://your.callback/notify",`
			`"on_close_payload": {"job_id": "..."}`
			`}`
			`}`
			```
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			### Release from a local consumer: `POST /release` (internal, localhost-only)
Add /release control endpoint to null pyworker 2026-05-11 16:59:46 +01:00
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`Closes the active session, regardless of who created it. No body, no`
			`auth. Use this when the queue consumer doesn't have (and shouldn't need)`
			the session's `session_auth`:
Add /release control endpoint to null pyworker 2026-05-11 16:59:46 +01:00
			```bash
			`curl -X POST http://127.0.0.1:18999/release`
			```

			`Responses:`

Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			- `200 {"released": true, "session_ids": ["..."]}` — closed; the held
			client-side `/session/create` completes and counts as a success.
			- `200 {"released": false, "reason": "no active session"}` — nothing
			`active, no-op.`
Add /release control endpoint to null pyworker 2026-05-11 16:59:46 +01:00
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			For setups where the dispatcher can hand the consumer `session_auth`
			`(e.g. as part of the queue payload), the consumer can instead POST`
			`/session/end` on the framework's HTTP-only port
			(`$WORKER_HTTP_PORT`, default `WORKER_PORT+1`) — the standard, fully
			`authenticated release path.`
Add /release control endpoint to null pyworker 2026-05-11 16:59:46 +01:00
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00			`## Environment variables`

Wire null pyworker healthcheck to a stub (and optional user URL) 2026-05-11 16:53:26 +01:00			- `BACKEND_HEALTH_URL` — absolute URL the framework should healthcheck
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			(e.g. `http://127.0.0.1:9090/health`). When set, the stub `/health`
			`route is not registered on the internal server.`
Add /release control endpoint to null pyworker 2026-05-11 16:59:46 +01:00			- `NULL_CONTROL_PORT` — port for the internal control server (hosts
			`/release` and optionally `/health`). Defaults to `18999`.
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00
			`## Deploying on Vast Serverless`

Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			1. Create a Serverless endpoint and point `PYWORKER_REPO` at this
			`repository (or your fork).`
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00			2. Set `BACKEND=null` in the template so `start_server.sh` runs
			`workers.null.worker`.
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`3. There is no model server to configure; you can omit model-related env`
			`vars entirely.`
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00			`4. Run your own queue-consumer process on the instance alongside the`
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`PyWorker. When it finishes its work:`
Add /release control endpoint to null pyworker 2026-05-11 16:59:46 +01:00			```bash
			`curl -X POST http://127.0.0.1:18999/release`
			```
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00
Document endpoint scaling parameters for null pyworker 2026-05-12 11:06:04 +01:00			`### Endpoint scaling parameters`

			The null worker reports `max_perf = 100` and each reservation is a
Revert default session cost to 100; document the over-provision as a workaround 2026-05-12 11:34:52 +01:00			session of `cost = 100`. The intended model is **one session = one
			`worker**, scaling elastically from zero up to as many concurrent`
			`sessions as you ask for.`
Document endpoint scaling parameters for null pyworker 2026-05-12 11:06:04 +01:00
			- `target_util = 1.0` — required. The default of `0.9` reserves
			`~11% spare capacity, which for a unit-occupancy worker rounds up to a`
			whole extra worker (e.g. `min_load = 100` becomes `100 / 0.9 = 111.1`
			→ 2 active workers instead of 1). With `target_util = 1.0` the math
			is clean: `min_load = 100 * N` keeps exactly `N` workers active.
Revert default session cost to 100; document the over-provision as a workaround 2026-05-12 11:34:52 +01:00			- `min_load = 0` — required for scale-to-zero. With `min_load = 0`
			and a positive `inactivity_timeout`, the endpoint can scale down to
			`zero active workers when no sessions exist.`
Document endpoint scaling parameters for null pyworker 2026-05-12 11:06:04 +01:00			- `max_workers` — cap on total reservations the endpoint can ever
			`serve concurrently.`
Revert default session cost to 100; document the over-provision as a workaround 2026-05-12 11:34:52 +01:00			- `inactivity_timeout` — positive value enables scale-to-zero
			`after the configured number of seconds of no active sessions. Use`
			alongside `cold_workers = 0` to also drop the inactive pool.
			- `max_queue_time = 0` and `target_queue_time = 0` —
			`recommended. The autoscaler computes per-worker queue-time as`
			`cur_load / max_perf` and sessions are in `cur_load`. With the
			defaults (~30s), an occupied null worker (`cur_load = 100`,
			`max_perf = 100`, implied queue = 1s) looks "available" for routing,
			`so a third reservation gets repeatedly 429'd and never triggers`
			`scale-up. Zeroing both knobs tells the autoscaler "don't estimate`
			`when this worker will free up; route to a free one or make a new`
			`one."`

			`#### Known autoscaler quirk`

			`In current Vast Serverless, scale-up reliably fires for the 1→2`
			`worker transition (the first 429 from an occupied worker activates a`
			`cold one), but the 2→3 transition often fails to fire — the`
			`third reservation 429s on both occupied workers and sits in the`
			`autoscaler's global queue indefinitely instead of activating a third`
			`cold worker. Scale-to-zero also has known issues.`

			`Fixes are pending on the Vast side. Until they land, a temporary`
			workaround is to over-provision by reporting `cost > max_perf` on
			`session creation:`

			```bash
			`python -m workers.null.client --demo --session-cost 200`
			```

			With `cost = 200, max_perf = 100`, each occupied worker reports
			`cur_load / max_perf = 2.0` — clearly over capacity, so the autoscaler
			`keeps one extra active worker warm per session. The next`
			`/session/create` lands on the warm worker directly with no queue.
			`This is a band-aid, not the design. The intended steady state`
			is `cost = 100` with predictable elastic scale-up.
Document endpoint scaling parameters for null pyworker 2026-05-12 11:06:04 +01:00
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00			`## Client example`

Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`Single reservation (holds for 180s):`
Add staggered --demo mode to null pyworker client 2026-05-11 17:08:44 +01:00
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00			```bash
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`python -m workers.null.client --endpoint <ENDPOINT_NAME>`
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00			```

Add staggered --demo mode to null pyworker client 2026-05-11 17:08:44 +01:00			`Staggered demo:`

			```bash
			`python -m workers.null.client --endpoint <ENDPOINT_NAME> --demo`
			```

Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`Starts three sessions 30s apart (all held concurrently), holds the`
Add --plateau to null pyworker demo (default 5min) 2026-05-11 18:26:31 +01:00			`3-worker plateau for 5 minutes so the autoscaler has time to actually`
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`provision the third worker before any scale-down starts, then closes`
			`the sessions one at a time, also 30s apart, and exits. Every session`
			ends cleanly via the SDK's `session.close()` — `200` successes in
			`metrics, no cancellations.`
Add --plateau to null pyworker demo (default 5min) 2026-05-11 18:26:31 +01:00
Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			Tune the timing with `--interval` and `--plateau`. To exercise the
			`local-release path, shell into a worker and run`
			`curl -X POST http://127.0.0.1:18999/release`.
Add null pyworker for queue-driven autoscaling 2026-05-11 16:48:52 +01:00
			`## Notes and caveats`

Rewrite null pyworker on the framework session model 2026-05-12 10:51:24 +01:00			`- The reservation's lifetime caps how long the session can live without`
			`client activity. Set it comfortably longer than the work you expect to`
			do, or have the client periodically POST `/ping` with `session_id` to
			`extend.`
			- The `on_close_route` payload (passed at `/session/create`) is POSTed by
			`the framework when the session ends. Useful for notifying your queue`
			`consumer that the reservation is closing.`
			- `/release` on the internal port is convenient but bypasses
			`session_auth`. If you need the standard authenticated release flow,
			pass `session_auth` to your consumer (e.g. through the queue payload)
			and have it POST to `/session/end` on the framework's HTTP port
			`instead.`