Collapse null pyworker client to a single mode parameterized by --count

Now that the session model means no HTTP connection is held during the reservation, the dichotomy between "single reserve" and "trapezoid demo" collapses — both are "open N sessions, each held for H seconds, started I seconds apart, close." Replace --reserve/--demo/--duration/--plateau with --count/--hold/--interval. --session-cost becomes --cost. Client is now 64 lines (down from 120). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Simplify null pyworker code and docs
2026-05-12 12:18:33 +01:00 · 2026-05-12 11:50:03 +01:00 · 2026-05-12 11:40:51 +01:00 · 2026-05-12 11:34:52 +01:00 · 2026-05-12 11:31:26 +01:00 · 2026-05-12 11:14:20 +01:00
8 changed files with 337 additions and 353 deletions
@@ -104,17 +104,13 @@ Images will be saved locally AND uploaded to `s3://{bucket}/comfyui/{filename}`.

 ### Custom Benchmark Workflows

-You can provide a custom ComfyUI workflow for benchmarking. This allows you to test performance using your preferred models and workflow complexity.
+You can provide a custom ComfyUI workflow for benchmarking by creating `workers/comfyui-json/misc/benchmark.json`. This allows you to test performance using your preferred models and workflow complexity.

-**Ways to provide the benchmark file** (in resolution order — first match wins):
+**Ways to provide the benchmark file:**
+- Fork this repository and add your `benchmark.json` file
+- Write the file during worker provisioning (onstart script or setup phase)

-1. **Fork this repository** and commit your workflow to `workers/comfyui-json/misc/benchmark.json`.
-2. **Write the file during provisioning** to a path *outside* the pyworker tree (e.g. `/workspace/benchmark.json`) and export `BENCHMARK_JSON_PATH` so the worker can find it. The pyworker repo is cloned by `start_server.sh` *after* provisioning runs, so provisioning cannot write into `misc/` directly — the destination would be clobbered, or the clone would fail.
-3. **Run on the vast.ai ComfyUI base image.** Its `convert-workflows.sh` maintains `/opt/comfyui-api-wrapper/workflows/pyworker_benchmark.json` as a symlink to the first provisioned workflow; the worker reads this automatically when neither of the above is set. No env var required.
-
-If `BENCHMARK_JSON_PATH` is set but points at a missing or unreadable file, the worker logs a warning and falls through to the next tier rather than going straight to the SD1.5 fallback.
-
-An example workflow is provided at `workers/comfyui-json/misc/benchmark.json.example`. To ensure varied generations, use the placeholder `__RANDOM_INT__` in place of static seed values — it will be replaced with a random integer for each benchmark run.
+An example file is provided in the repository. To ensure varied generations, use the placeholder `__RANDOM_INT__` in place of static seed values - it will be replaced with a random integer for each benchmark run.

 ### Default Benchmark (Fallback)

@@ -124,10 +120,9 @@ The default benchmark uses Stable Diffusion v1.5 with ComfyUI's standard text-to

 | Environment Variable | Default Value | Description |
 | -------------------- | ------------- | ----------- |
-| BENCHMARK_JSON_PATH | (unset) | Path to a custom workflow file outside the pyworker tree. Used if `misc/benchmark.json` is absent. Falls through to `/opt/comfyui-api-wrapper/workflows/pyworker_benchmark.json` if set but missing. |
-| BENCHMARK_TEST_WIDTH | 512 | Fallback benchmark: image width (pixels) |
-| BENCHMARK_TEST_HEIGHT | 512 | Fallback benchmark: image height (pixels) |
-| BENCHMARK_TEST_STEPS | 20 | Fallback benchmark: number of denoising steps |
+| BENCHMARK_TEST_WIDTH | 512 | Image width (pixels) |
+| BENCHMARK_TEST_HEIGHT | 512 | Image height (pixels) |
+| BENCHMARK_TEST_STEPS | 20 | Number of denoising steps |

 Each benchmark run uses a random prompt from `misc/test_prompts.txt` and a random seed to ensure consistent GPU load patterns.

@@ -1,107 +0,0 @@
-{
-    "3": {
-        "inputs": {
-            "seed": "__RANDOM_INT__",
-            "steps": 20,
-            "cfg": 8,
-            "sampler_name": "euler",
-            "scheduler": "normal",
-            "denoise": 1,
-            "model": [
-            "4",
-            0
-            ],
-            "positive": [
-            "6",
-            0
-            ],
-            "negative": [
-            "7",
-            0
-            ],
-            "latent_image": [
-            "5",
-            0
-            ]
-        },
-        "class_type": "KSampler",
-        "_meta": {
-            "title": "KSampler"
-        }
-    },
-    "4": {
-        "inputs": {
-            "ckpt_name": "v1-5-pruned-emaonly-fp16.safetensors"
-        },
-        "class_type": "CheckpointLoaderSimple",
-        "_meta": {
-            "title": "Load Checkpoint"
-        }
-    },
-    "5": {
-        "inputs": {
-            "width": 512,
-            "height": 512,
-            "batch_size": 1
-        },
-        "class_type": "EmptyLatentImage",
-        "_meta": {
-            "title": "Empty Latent Image"
-        }
-    },
-    "6": {
-        "inputs": {
-            "text": "beautiful scenery nature glass bottle landscape, , purple galaxy bottle,",
-            "clip": [
-            "4",
-            1
-            ]
-        },
-        "class_type": "CLIPTextEncode",
-        "_meta": {
-            "title": "CLIP Text Encode (Prompt)"
-        }
-    },
-    "7": {
-        "inputs": {
-            "text": "text, watermark",
-            "clip": [
-            "4",
-            1
-            ]
-        },
-        "class_type": "CLIPTextEncode",
-        "_meta": {
-            "title": "CLIP Text Encode (Prompt)"
-        }
-    },
-    "8": {
-        "inputs": {
-            "samples": [
-            "3",
-            0
-            ],
-            "vae": [
-            "4",
-            2
-            ]
-        },
-        "class_type": "VAEDecode",
-        "_meta": {
-            "title": "VAE Decode"
-        }
-    },
-    "9": {
-        "inputs": {
-            "filename_prefix": "ComfyUI",
-            "images": [
-            "8",
-            0
-            ]
-        },
-        "class_type": "SaveImage",
-        "_meta": {
-            "title": "Save Image"
-        }
-    }
-}
@@ -1,34 +0,0 @@
-cartoon character of a person with a hoodie , in style of cytus and deemo, ork, gold chains, realistic anime cat, dripping black goo, lineage revolution style, thug life, cute anthropomorphic bunny, balrog, arknights, aliased, very buff, black and red and yellow paint, painting illustration collage style, character composition in vector with white background
-stardew valley, fine details
-2D Vector Illustration of a child with soccer ball Art for Sublimation, Design Art, Chrome Art, Painting and Stunning Artwork, Highly Detailed Digital Painting, Airbrush Art, Highly Detailed Digital Artwork, Dramatic Artwork, stained antique yellow copper paint, digital airbrush art, detailed by Mark Brooks, Chicano airbrush art, Swagger! snake Culture
-realistic futuristic city-downtown with short buildings, sunset
-seascape by Ray Collins and artgerm, front view of a perfect wave, sunny background, ultra detailed water
-inspired by realflow-cinema4d editor features, create image of a transparent luxury cup with ice fruits and mint, connected with white, yellow and pink cream, Slow - High Speed MO Photography, YouTube Video Screenshot, Abstract Clay, Transparent Cup , molecular gastronomy, wheel, 3D fluid,Simulation rendering, still video, 4k polymer clay futras photography, very surreal, Houdini Fluid Simulation, hyperrealistic CGI and FLUIDS & MULTIPHYSICS SIMULATION effect, with Somali Stain Lurex, Metallic Jacquard, Gold Thread, Mulberry Silk, Toub Saree, Warm background, a fantastic image worthy of an award.
-biker with backpack on his back riding a motorcycle, Style by Ade Santora, Oilpunk, Cover photo, craig mullins style, on the cover of a magazine, Outdoor Magazine, inspired by Alex Petruk APe, image of a male biker, Cover of an award-winning magazine, the man has a backpack, photo for magazine, with a backpack, magazine cover
-generate a collage-style illustration inspired by the Procreate raster graphic editor, photographic illustration with the theme, 2D vector, art for textile sublimation, containing surrealistic cartoon cat wearing a baseball cap and jeans standing in front of a poster, inspired by Sadao Watanabe, Doraemon, Japanese cartoon style, Eichiro Oda, Iconic high detail character, Director: Nakahara Nantenbō, Kastuhiro Otomo, image detailed, by Miyamoto, Hidetaka Miyazaki, Katsuhiro illustration, 8k, masterpiece, Minimize noise and grain in photo quality without lose quality and increase brightness and lighting,Symmetry and Alignment, Avoid asymmetrical shapes and out-of-focus points. Focus and Sharpness: Make sure the image is focused and sharp and encourages the viewer to see it as a work of art printed on fabric.
-fantasy medieval village world inside a glass sphere , high detail, fantasy, realistic, light effect, hyper detail, volumetric lighting, cinematic, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art cg render made in maya, blender and photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, world inside a glass sphere by james gurney by artgerm with james jean, joe fenton and tristan eaton by ross tran, fine details
-armored hero with a glowing axe, mecha science_fiction, jungle background, dynamic lighting, detailed shading, digital texture painting, masterpiece, studio quality, 6k
-elderly figure in a leather jacket DJing in a smoky nightclub, mixing live on a giant console, dramatic stage lighting, a masterpiece
-elderly figure in a leather jacket on a motorcycle, magazine cover lighting, a masterpiece
-a young pilot ordering a burger and fries from a futuristic space cantina
-I want to generate a group avatar for a Feishu group chat. The role of this group is daily software technical communication. Now the subject technology stacks that members of this group discuss daily include: algorithms, data structures, optimization, functional programming, and the programming languages often discussed are: TypeScript, Java, python, etc. I hope this avatar has a simple aesthetic, this avatar is a single person avatar
-portrait Anime black girl cute-fine-face, pretty face, realistic shaded Perfect face, fine details. Anime. realistic shaded lighting by Ilya Kuvshinov Giuseppe Dangelico Pino and Michael Garmash and Rob Rey, IAMAG premiere, WLOP matte print, cute freckles, masterpiece
-young woman in modern fashion editorial, beige miniskirt and dark brown turtleneck sweater, soft studio lighting, brown hair, grey eyes, fine details, magazine cover style, a masterpiece
-Cute small cat sitting in a movie theater eating chicken wiggs watching a movie ,unreal engine, cozy indoor lighting, artstation, detailed, digital painting,cinematic,character design by mark ryden and pixar and hayao miyazaki, unreal 5, daz, hyperrealistic, octane render
-Cute small dog sitting in a movie theater eating popcorn watching a movie ,unreal engine, cozy indoor lighting, artstation, detailed, digital painting,cinematic,character design by mark ryden and pixar and hayao miyazaki, unreal 5, daz, hyperrealistic, octane render
-fox bracelet made of buckskin with fox features, rich details, fine carvings, studio lighting
-crane buckskin bracelet with crane features, rich details, fine carvings, studio lighting
-london luxurious interior living-room, light walls
-Parisian luxurious interior penthouse bedroom, dark walls, wooden panels
-cute girl, crop-top, blond hair, black glasses, stretching, with background by greg rutkowski makoto shinkai kyoto animation key art feminine mid shot
-houses in front, houses background, straight houses, digital art, smooth, sharp focus, gravity falls style, doraemon style, shinchan style, anime style
-Simplified technical drawing, Leonardo da Vinci, Mechanical Dinosaur Skeleton, Minimalistic annotations, Hand-drawn illustrations, Basic design and engineering, Wonder and curiosity
-High quality 8K painting impressionist style of a Japanese modern city street with a girl on the foreground wearing a traditional wedding dress with a fox mask, staring at the sky, daylight
-a landscape from the Moon with the Earth setting on the horizon, realistic, detailed
-Isometric Atlantis city,great architecture with columns, great details, ornaments,seaweed, blue ambiance, 3D cartoon style, soft light, 45° view
-A hyper realistic avatar of a guy riding on a black honda cbr 650r in leather suit,high detail, high quality,8K,photo realism
-the street of amedieval fantasy town, at dawn, dark, highly detailed
-overwhelmingly beautiful eagle framed with vector flowers, long shiny wavy flowing hair, polished, ultra detailed vector floral illustration mixed with hyper realism, muted pastel colors, vector floral details in background, muted colors, hyper detailed ultra intricate overwhelming realism in detailed complex scene with magical fantasy atmosphere, no signature, no watermark
-a highly detailed matte painting of a man on a hill watching a rocket launch in the distance by studio ghibli, makoto shinkai, by artgerm, by wlop, by greg rutkowski, volumetric lighting, octane render, 4 k resolution, trending on artstation, masterpiece | hyperrealism| highly detailed| insanely detailed| intricate| cinematic lighting| depth of field
-electronik robot and ofice ,unreal engine, cozy indoor lighting, artstation, detailed, digital painting,cinematic,character design by mark ryden and pixar and hayao miyazaki, unreal 5, daz, hyperrealistic, octane render
-exquisitely intricately detailed illustration, of a small world with a lake and a rainbow, inside a closed glass jar.
@@ -1,225 +1,60 @@
-"""ComfyUI worker for the vast.ai PyWorker SDK.
-
-Each worker runs a benchmark on warm-up. The payload is selected as follows:
-
-  1. If ``misc/benchmark.json`` exists in the cloned worker tree, it is
-     used as a custom ComfyUI workflow. Use this if you fork the repo and
-     bake in your workflow.
-  2. Else, if ``$BENCHMARK_JSON_PATH`` is set and points at a readable
-     file, it is used. Use this from a provisioning script — provisioning
-     runs before pyworker is cloned, so it cannot write into ``misc/``,
-     but it can drop the workflow elsewhere (e.g. ``/workspace/``) and
-     export this env var.
-  3. Else, if the well-known path
-     ``/opt/comfyui-api-wrapper/workflows/pyworker_benchmark.json`` exists,
-     it is used. The vast.ai ComfyUI base image's ``convert-workflows.sh``
-     maintains this as a symlink to the first provisioned workflow, so on
-     that image no env var is needed.
-  4. Otherwise an SD1.5 Text2Image fallback runs, parameterised by the
-     ``BENCHMARK_TEST_{WIDTH,HEIGHT,STEPS}`` env vars and a random prompt
-     from ``misc/test_prompts.txt``.
-
-``__RANDOM_INT__`` placeholders in custom workflows are substituted
-server-side by ai-dock/comfyui-api-wrapper, so this worker does not handle
-them itself.
-"""
-
-import json
-import logging
-import os
 import random
 import sys
-from pathlib import Path

 from vastai import Worker, WorkerConfig, HandlerConfig, LogActionConfig, BenchmarkConfig

-# ComfyUI model configuration. The model server is ai-dock's
-# comfyui-api-wrapper sitting in front of ComfyUI itself, not ComfyUI's
-# own port (18188). We tail the api-wrapper's log rather than ComfyUI's
-# and key off the api-wrapper's own structured readiness/fault signals:
-#
-#   BACKENDS_READY            — api-wrapper has confirmed every ComfyUI
-#                               backend passes HTTP+WS probes. Until
-#                               this fires, posting to /generate/sync
-#                               can hit "Cannot connect to host" inside
-#                               the api-wrapper, which the SDK can't
-#                               recover from since __call_backend
-#                               doesn't retry connection-refused.
-#   BACKENDS_READY_TIMEOUT    — backends never reachable within
-#                               api-wrapper's deadline. Worker is
-#                               unrecoverable; mark errored.
-#   BACKEND_UNRECOVERABLE     — CUDA fault / illegal memory access on a
-#                               backend's GPU. Same fate.
-#   Application startup failed — uvicorn's own ASGI lifespan failed.
-#
-# These tokens are emitted by ai-dock/comfyui-api-wrapper >= the
-# "feat/backend-readiness-log-signals" change. Older wrappers won't
-# emit BACKENDS_READY, so warm-up will stall — pin the wrapper version
-# accordingly.
+# ComyUI model configuration
 MODEL_SERVER_URL           = 'http://127.0.0.1'
 MODEL_SERVER_PORT          = 18288
-MODEL_LOG_FILE             = '/var/log/portal/api-wrapper.log'
+MODEL_LOG_FILE             = '/var/log/portal/comfyui.log'
 MODEL_HEALTHCHECK_ENDPOINT = "/health"

-# Trigger benchmark only after the full stack (api-wrapper + ComfyUI
-# backends) is reachable. See BACKENDS_READY in the comment above.
+# ComyUI-specific log messages
 MODEL_LOAD_LOG_MSG = [
-    "BACKENDS_READY",
+    "To see the GUI go to: "
 ]

-# LogAction.ModelError is fatal: the SDK calls backend_errored() and
-# locks the worker into a permanent error state. Patterns must
-# therefore only match conditions where the api-wrapper genuinely
-# cannot serve any request — supervisord restarts on uvicorn exit, so
-# a real failure self-heals rather than dragging the worker down.
-#
-# Notably *not* matched here:
-#   - per-request errors (PreprocessWorker failures, ComfyUI workflow
-#     validation, "Value not in list:") — one malformed client payload
-#     would otherwise kill the worker
-#   - "CUDA out of memory" — surfaces both as a misconfigured GPU
-#     (which the benchmark-failure path already catches via
-#     backend_errored) and as a too-greedy client request, which is
-#     indistinguishable from a substring match
-#   - convert-workflows.sh warnings — that script is not load-bearing
-#     for serving
 MODEL_ERROR_LOG_MSGS = [
-    "BACKENDS_READY_TIMEOUT",       # backends never reachable
-    "BACKEND_UNRECOVERABLE",        # CUDA fault latched per backend
-    "Application startup failed",   # uvicorn ASGI lifespan startup failed
+    "MetadataIncompleteBuffer",
+    "Value not in list: ",
+    "[ERROR] Provisioning Script failed"
 ]

-# LogAction.Info is purely informational (echoes log lines into the vast
-# console). Nothing in api-wrapper.log is currently worth surfacing —
-# model downloads are upstream in provisioning, per-request logs are
-# too noisy.
-MODEL_INFO_LOG_MSGS = []
+MODEL_INFO_LOG_MSGS = [
+    '"message":"Downloading'
+]

-# Benchmark assets shipped alongside this worker. Resolved relative to this
-# file so the worker keeps working regardless of the launch cwd.
-MISC_DIR       = Path(__file__).parent / "misc"
-BENCHMARK_FILE = MISC_DIR / "benchmark.json"
-TEST_PROMPTS   = MISC_DIR / "test_prompts.txt"
-
-# Well-known location maintained by the vast.ai ComfyUI base image.
-# convert-workflows.sh symlinks this to the first provisioned workflow,
-# letting the base image work out-of-the-box without any env var.
-WELLKNOWN_BENCHMARK = Path("/opt/comfyui-api-wrapper/workflows/pyworker_benchmark.json")
-
-log = logging.getLogger(__name__)
-
-# Used when test_prompts.txt is unreadable or empty. Bare and generic
-# on purpose — this is a benchmark seed, not a creative output.
-_FALLBACK_PROMPT = "a still life on a wooden table, soft daylight"
+benchmark_prompts = [
+    "Cartoon hoodie hero; orc, anime cat, bunny; black goo; buff; vector on white.",
+    "Cozy farming-game scene with fine details.",
+    "2D vector child with soccer ball; airbrush chrome; swagger; antique copper.",
+    "Realistic futuristic downtown of low buildings at sunset.",
+    "Perfect wave front view; sunny seascape; ultra-detailed water; artful feel.",
+    "Clear cup with ice, fruit, mint; creamy swirls; fluid-sim CGI; warm glow.",
+    "Male biker with backpack on motorcycle; oilpunk; award-worthy magazine cover.",
+    "Collage for textile; surreal cartoon cat in cap/jeans before poster; crisp.",
+    "Medieval village inside glass sphere; volumetric light; macro focus.",
+    "Iron Man with glowing axe; mecha sci-fi; jungle scene; dynamic light.",
+    "Pope Francis DJ in leather jacket, mixing on giant console; dramatic.",
+]


-def _env_int(name: str, default: int) -> int:
-    """Read an integer env var, warning + falling back on bad values."""
-    raw = os.getenv(name)
-    if raw is None or raw == "":
-        return default
-    try:
-        return int(raw)
-    except ValueError:
-        log.warning("ignoring %s=%r (not an int); using default %d", name, raw, default)
-        return default

-
-def _try_load_workflow(path: Path) -> dict | None:
-    """Load and return a benchmark workflow from ``path``.
-
-    Returns None on any failure (path missing, not a regular file,
-    unreadable, invalid JSON) so the caller can fall through to the
-    next tier rather than dropping straight to the SD1.5 default.
-    """
-    if not path.is_file():
-        return None
-    try:
-        with open(path) as f:
-            return json.load(f)
-    except (json.JSONDecodeError, OSError) as e:
-        log.warning("Failed to load %s: %s; trying next tier", path, e)
-        return None
-
-
-def _custom_workflow_payload() -> dict | None:
-    """Try each benchmark workflow tier in order; return the first one
-    that loads cleanly as a payload, or None if every tier is absent /
-    unreadable. Tiers (in order): in-tree ``misc/benchmark.json``,
-    ``$BENCHMARK_JSON_PATH``, well-known base-image symlink.
-    """
-    env_path = os.getenv("BENCHMARK_JSON_PATH")
-    candidates = [("misc", BENCHMARK_FILE)]
-    if env_path:
-        candidates.append(("env", Path(env_path)))
-    candidates.append(("well-known", WELLKNOWN_BENCHMARK))
-
-    for label, path in candidates:
-        # Surface a warning specifically when the operator pointed
-        # BENCHMARK_JSON_PATH at something we can't use — silent
-        # fall-through there is a footgun (typo => SD1.5 fallback,
-        # operator wonders why custom benchmark didn't take).
-        if not path.is_file():
-            if label == "env":
-                log.warning(
-                    "BENCHMARK_JSON_PATH=%s is not a readable file; trying fallbacks", path
-                )
-            continue
-        workflow = _try_load_workflow(path)
-        if workflow is None:
-            continue
-        log.info("Using custom benchmark workflow from %s (%s)", path, label)
-        return {
-            "input": {
-                "request_id": f"test-{random.randint(1000, 99999)}",
-                "workflow_json": workflow,
-            }
-        }
-    return None
-
-
-def _load_prompts() -> list[str]:
-    """Read misc/test_prompts.txt; defensive against missing/empty file."""
-    try:
-        with open(TEST_PROMPTS) as f:
-            prompts = [line.strip() for line in f if line.strip()]
-    except OSError as e:
-        log.warning("could not read %s: %s; using built-in fallback prompt", TEST_PROMPTS, e)
-        return [_FALLBACK_PROMPT]
-    if not prompts:
-        log.warning("%s is empty; using built-in fallback prompt", TEST_PROMPTS)
-        return [_FALLBACK_PROMPT]
-    return prompts
-
-
-def _default_payload() -> dict:
-    """Build the SD1.5 Text2Image fallback payload."""
-    prompts = _load_prompts()
-    return {
+benchmark_dataset = [
+    {
        "input": {
            "request_id": f"test-{random.randint(1000, 99999)}",
            "modifier": "Text2Image",
            "modifications": {
-                "prompt": random.choice(prompts),
-                "width":  _env_int("BENCHMARK_TEST_WIDTH",  512),
-                "height": _env_int("BENCHMARK_TEST_HEIGHT", 512),
-                "steps":  _env_int("BENCHMARK_TEST_STEPS",  20),
-                "seed":   random.randint(0, sys.maxsize),
+                "prompt": prompt,
+                "width": 512,
+                "height": 512,
+                "steps": 20,
+                "seed": random.randint(0, sys.maxsize)
            }
        }
-    }
-
-
-def make_benchmark_payload() -> dict:
-    """Build one benchmark request payload.
-
-    Called once per benchmark run by the SDK; using a generator (rather
-    than a static ``dataset=``) lets each run re-pick a prompt and re-roll
-    the seed, and avoids holding multiple copies of a large workflow JSON
-    in memory.
-    """
-    return _custom_workflow_payload() or _default_payload()
-
+    } for prompt in benchmark_prompts
+]

 worker_config = WorkerConfig(
    model_server_url=MODEL_SERVER_URL,
@@ -232,7 +67,7 @@ worker_config = WorkerConfig(
            allow_parallel_requests=False,
            max_queue_time=10.0,
            benchmark_config=BenchmarkConfig(
-                generator=make_benchmark_payload,
+                dataset=benchmark_dataset,
            )
        )
    ],
@@ -0,0 +1,88 @@
+# Null PyWorker
+
+Holds Vast Serverless reservations open without forwarding any work to a
+model. Use it when your real workload (a queue consumer in any language)
+runs as a separate process on the instance and you just want to drive
+Vast autoscaling: **one POST reserves a worker, one POST releases it.**
+
+## Use case
+
+You have a job queue on your own infrastructure (Redis, SQS, NATS, etc.)
+and a consumer (node, golang, python, a binary — anything) that pulls
+from it. You want one Vast worker per unit of in-flight work, scaling
+elastically from zero. The null PyWorker is the autoscaling driver; your
+consumer does the work.
+
+## How it works
+
+Reservations use the framework's session API. The SDK's
+`endpoint.session(...)` POSTs `/session/create` to reserve a worker;
+`session.close()` POSTs `/session/end` to release it. `max_sessions=1`
+means each worker holds exactly one reservation — the next reservation
+either lands on a free worker or triggers a scale-up.
+
+The PyWorker itself does nothing functional:
+
+- One trivial `/ping` route to satisfy the framework's benchmark
+  requirement (its `max_perf` is pinned to 100).
+- An internal `/release` endpoint on `127.0.0.1:18999` for the local
+  consumer to end the session without needing `session_auth`.
+
+## Endpoint parameters
+
+Tested working configuration:
+
+| Parameter | Value | Why |
+|---|---|---|
+| `target_util` | `1.0` | One session = one worker. Default `0.9` rounds up to an extra worker. |
+| `min_load` | `0` | Scale-to-zero floor. |
+| `max_queue_time` | `1` | Stop routing to an occupied worker after ~1s of implied queue. |
+| `target_queue_time` | `0.5` | Trigger scale-up promptly once anything queues. |
+| `inactivity_timeout` | `10` (seconds) | Permit scale-to-zero after 10s idle. |
+
+## API
+
+| Route | Where | Use |
+|---|---|---|
+| `POST /session/create` | endpoint, signed | Reserve a worker (`endpoint.session(...)`) |
+| `POST /session/end` | endpoint, signed | Release (`session.close()`) |
+| `POST /release` | `127.0.0.1:18999`, no auth | Local consumer release, no `session_auth` needed |
+
+## Healthcheck
+
+Default: stub on `127.0.0.1:18999/health` returning `200`. Set
+`BACKEND_HEALTH_URL=http://127.0.0.1:9090/health` (absolute URL) to point
+the framework at your queue consumer's health endpoint instead — if the
+consumer dies, the autoscaler sees the worker as broken.
+
+## Deploying
+
+1. Point `PYWORKER_REPO` at this repo (or your fork).
+2. Set `BACKEND=null` in the template.
+3. Run your queue consumer alongside the PyWorker. When it's done with
+   a unit of work:
+   ```bash
+   curl -X POST http://127.0.0.1:18999/release
+   ```
+
+## Client demo
+
+```bash
+# Single reservation, hold 180s
+python -m workers.null.client --endpoint <NAME> --instance alpha
+
+# Three concurrent reservations, started 30s apart, each held 360s
+python -m workers.null.client --endpoint <NAME> --instance alpha --count 3 --hold 360
+```
+
+Flags: `--count` (number of concurrent sessions, default 1), `--hold`
+(seconds each session is held, default 180), `--interval` (seconds
+between starts when `--count > 1`, default 30), `--cost` (cost reported
+at session-create, default 100 = `max_perf`), `--instance` (`prod` |
+`alpha` | `candidate` | `local`).
+
+## Environment variables
+
+- `BACKEND_HEALTH_URL` — absolute URL the framework healthchecks. Stub
+  is used when unset.
+- `NULL_CONTROL_PORT` — internal control server port. Defaults to `18999`.
@@ -0,0 +1,64 @@
+import argparse
+import asyncio
+import logging
+import os
+import sys
+
+from vastai import Serverless
+
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s[%(levelname)-5s] %(message)s",
+    datefmt="%Y-%m-%d %H:%M:%S",
+)
+log = logging.getLogger(__file__)
+
+
+async def reserve(client: Serverless, endpoint_name: str, hold: float, cost: int, label: str):
+    endpoint = await client.get_endpoint(name=endpoint_name)
+    async with await endpoint.session(cost=cost, lifetime=hold + 60) as s:
+        sid = s.session_id
+        log.info("[%s] %s open, holding %.0fs", label, sid, hold)
+        await asyncio.sleep(hold)
+    log.info("[%s] %s closed", label, sid)
+
+
+async def main_async():
+    p = argparse.ArgumentParser(description="Vast Null PyWorker demo client")
+    p.add_argument("--endpoint", default=os.environ.get("VAST_ENDPOINT", "null-prod"))
+    p.add_argument("--instance", choices=("prod", "alpha", "candidate", "local"),
+                   default=os.environ.get("VAST_INSTANCE", "prod"))
+    p.add_argument("--count", type=int, default=1,
+                   help="concurrent sessions to open (default: 1)")
+    p.add_argument("--interval", type=float, default=30.0,
+                   help="seconds between session starts when count>1 (default: 30)")
+    p.add_argument("--hold", type=float, default=180.0,
+                   help="seconds to hold each session (default: 180)")
+    p.add_argument("--cost", type=int, default=100,
+                   help="cost reported at session-create (default: 100)")
+    args = p.parse_args()
+
+    print(f"endpoint={args.endpoint} instance={args.instance} "
+          f"count={args.count} hold={args.hold}s cost={args.cost}")
+
+    try:
+        async with Serverless(instance=args.instance) as client:
+            tasks = []
+            for i in range(args.count):
+                label = f"res-{i+1}" if args.count > 1 else "reservation"
+                tasks.append(asyncio.create_task(
+                    reserve(client, args.endpoint, args.hold, args.cost, label),
+                    name=label,
+                ))
+                if i + 1 < args.count:
+                    await asyncio.sleep(args.interval)
+            await asyncio.gather(*tasks, return_exceptions=True)
+    except KeyboardInterrupt:
+        log.info("Interrupted")
+    except Exception as e:
+        log.error("Error: %s", e, exc_info=True)
+        sys.exit(1)
+
+
+if __name__ == "__main__":
+    asyncio.run(main_async())
@@ -0,0 +1,143 @@
+import asyncio
+import logging
+import os
+from contextlib import asynccontextmanager
+from urllib.parse import urlsplit
+
+from aiohttp import web
+
+from vastai import (
+    Worker,
+    WorkerConfig,
+    HandlerConfig,
+    BenchmarkConfig,
+    LogActionConfig,
+)
+
+log = logging.getLogger(__file__)
+
+TARGET_PERF = 100.0
+BENCHMARK_SENTINEL = "__null_worker_benchmark__"
+
+INTERNAL_HOST = "127.0.0.1"
+INTERNAL_PORT = int(os.environ.get("NULL_CONTROL_PORT", 18999))
+STUB_HEALTH_PATH = "/health"
+
+BACKEND_HEALTH_URL = os.environ.get("BACKEND_HEALTH_URL", "").strip()
+if BACKEND_HEALTH_URL:
+    _p = urlsplit(BACKEND_HEALTH_URL)
+    if not _p.scheme or not _p.hostname:
+        raise ValueError(f"BACKEND_HEALTH_URL must be absolute, got: {BACKEND_HEALTH_URL!r}")
+    HEALTH_BASE_URL = f"{_p.scheme}://{_p.hostname}"
+    HEALTH_PORT = _p.port or (443 if _p.scheme == "https" else 80)
+    HEALTH_PATH = _p.path or "/"
+    USE_STUB_HEALTH = False
+else:
+    HEALTH_BASE_URL = f"http://{INTERNAL_HOST}"
+    HEALTH_PORT = INTERNAL_PORT
+    HEALTH_PATH = STUB_HEALTH_PATH
+    USE_STUB_HEALTH = True
+
+
+_backend_ref: dict = {"backend": None}
+
+
+def _build_internal_app() -> web.Application:
+    app = web.Application()
+
+    async def release_handler(_request: web.Request) -> web.Response:
+        # Closes the singleton session. Uses name-mangled __close_session
+        # to bypass the session_auth check — safe because this server is
+        # bound to 127.0.0.1, and it spares the consumer from threading
+        # session_auth through its queue.
+        backend = _backend_ref.get("backend")
+        if backend is None:
+            return web.json_response({"released": False, "reason": "backend not ready"}, status=503)
+        sids = list(backend.sessions.keys())
+        if not sids:
+            return web.json_response({"released": False, "reason": "no active session"}, status=200)
+        closed = []
+        for sid in sids:
+            try:
+                if await backend._Backend__close_session(sid):
+                    closed.append(sid)
+            except Exception as e:
+                log.warning(f"Error closing session {sid}: {e}")
+        return web.json_response({"released": bool(closed), "session_ids": closed}, status=200)
+
+    app.router.add_post("/release", release_handler)
+
+    if USE_STUB_HEALTH:
+        async def stub_health(_request: web.Request) -> web.Response:
+            return web.Response(status=200, text="ok")
+        app.router.add_get(STUB_HEALTH_PATH, stub_health)
+
+    return app
+
+
+@asynccontextmanager
+async def null_lifecycle():
+    # Pin max_throughput to TARGET_PERF exactly — the framework's
+    # __run_benchmark short-circuits to float(file_contents) if this exists.
+    try:
+        with open(".has_benchmark", "w") as fh:
+            fh.write(str(int(TARGET_PERF)))
+    except OSError as e:
+        log.warning(f"Could not pin benchmark cache: {e}")
+
+    runner = web.AppRunner(_build_internal_app())
+    await runner.setup()
+    await web.TCPSite(runner, INTERNAL_HOST, INTERNAL_PORT).start()
+
+    log.info(
+        "Null pyworker control server: http://%s:%d (POST /release%s)",
+        INTERNAL_HOST,
+        INTERNAL_PORT,
+        f", GET {STUB_HEALTH_PATH}" if USE_STUB_HEALTH else "",
+    )
+    if not USE_STUB_HEALTH:
+        log.info("Framework healthcheck → %s", BACKEND_HEALTH_URL)
+
+    try:
+        yield
+    finally:
+        await runner.cleanup()
+
+
+async def ping(**params: object) -> dict:
+    # Exists only to satisfy the framework's "at least one handler with a
+    # BenchmarkConfig" requirement. Sleep 1s on the benchmark path as a
+    # fallback in case the .has_benchmark cache pin failed; otherwise the
+    # benchmark cache short-circuits and this never runs.
+    if params.get(BENCHMARK_SENTINEL):
+        await asyncio.sleep(1.0)
+        return {"ok": True, "benchmark": True}
+    return {"ok": True}
+
+
+worker_config = WorkerConfig(
+    model_server_url=HEALTH_BASE_URL,
+    model_server_port=HEALTH_PORT,
+    model_healthcheck_url=HEALTH_PATH,
+    lifecycle=null_lifecycle(),
+    max_sessions=1,
+    handlers=[
+        HandlerConfig(
+            route="/ping",
+            allow_parallel_requests=True,
+            remote_function=ping,
+            workload_calculator=lambda _payload: TARGET_PERF,
+            benchmark_config=BenchmarkConfig(
+                generator=lambda: {BENCHMARK_SENTINEL: True},
+                runs=1,
+                concurrency=1,
+                do_warmup=False,
+            ),
+        ),
+    ],
+    log_action_config=LogActionConfig(),
+)
+
+_worker = Worker(worker_config)
+_backend_ref["backend"] = _worker.backend
+_worker.run()
Author	SHA1	Message	Date
Rob Ballantyne	a81d3febe7	Collapse null pyworker client to a single mode parameterized by --count Now that the session model means no HTTP connection is held during the reservation, the dichotomy between "single reserve" and "trapezoid demo" collapses — both are "open N sessions, each held for H seconds, started I seconds apart, close." Replace --reserve/--demo/--duration/--plateau with --count/--hold/--interval. --session-cost becomes --cost. Client is now 64 lines (down from 120). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 12:18:33 +01:00
Rob Ballantyne	913e3a8782	Simplify null pyworker code and docs Pass over all three files to drop verbose expository commentary that duplicated either the code or the README. Net: -284 lines. README now reads top-to-bottom in roughly the order someone would need the info: use case → how it works → endpoint params → API → healthcheck → deploy → demo. Endpoint params table uses the values actually tested on alpha (min_load=0, target_util=1, max_queue_time=1, target_queue_time=0.5, inactivity_timeout=10). Dropped the "known autoscaler quirk" section now that alpha addresses it; kept the --session-cost flag as a debugging knob. worker.py and client.py keep the same behavior but trim long block comments and multi-line docstrings the code didn't need. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:50:03 +01:00
Rob Ballantyne	47ad0ebe0a	Add --instance flag to null pyworker client Lets the demo target run-alpha.vast.ai (or candidate/local) without editing code. Defaults to prod; respects VAST_INSTANCE env var. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:40:51 +01:00
Rob Ballantyne	34fd21e76a	Revert default session cost to 100; document the over-provision as a workaround cost = max_perf = 100 is the intended steady-state semantics: one session = one worker, scaling elastically from zero. Reverting the default so the design reads correctly even where current autoscaler bugs make it misbehave (2→3 scale-up not firing reliably, scale-to-zero issues — fixes pending on the Vast side). README now describes the intended model first (clean unit occupancy, scale-to-zero via inactivity_timeout + min_load=0), then flags the known autoscaler quirk and presents --session-cost 200 as a temporary band-aid until the Vast fixes land. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:34:52 +01:00
Rob Ballantyne	1d2caaf554	Default null pyworker session cost to 2x max_perf Reporting cost == max_perf puts an occupied worker at exactly 100% utilization, which the autoscaler reads as "at target, no action." The 3rd session_create then 429s on both active workers and stalls in the global queue instead of triggering a cold-worker activation (observed: 1→2 active scales fine, 2→3 does not). Bumping cost to 2 * max_perf makes each session look like more than one worker's work, so the autoscaler always keeps an extra active worker hot. Slight over-provisioning, but the 3rd reservation lands directly on a free worker rather than queueing. Expose --session-cost on the client so the value can be swept without edits. README documents the trade-off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:31:26 +01:00
Rob Ballantyne	01eff874d8	Correct queue-time guidance for null pyworker endpoints Earlier note claimed max_queue_time / target_queue_time were no-ops because the worker's internal wait_time property filters sessions out. That filter only affects per-worker rejection on a given handler — the autoscaler doesn't see the property and computes its own queue-time estimate from cur_load / max_perf, which does include sessions. With defaults around 30s, an occupied null worker (cur_load=100, max_perf=100, implied queue=1s) still looks "available" to the autoscaler, so a third reservation gets queued on an existing worker via repeated 429-retries instead of triggering scale-up. Fix: set max_queue_time = 0 and target_queue_time = 0 on the endpoint. Any in-flight load marks the worker "full" for routing, and any observed queue time triggers immediate scale-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:14:20 +01:00
Rob Ballantyne	d51f04a176	Await endpoint.session() in null pyworker client endpoint.session() forwards to start_endpoint_session, which is async def — so the call returns a coroutine, not a Session, despite the SDK's return-type annotation. Use 'async with await endpoint.session(...)' so the coroutine resolves to a Session before entering the context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:07:32 +01:00
Rob Ballantyne	ef248ef695	Document endpoint scaling parameters for null pyworker Add a scaling-parameters section to the README covering target_util=1.0 (the critical one — the default 0.9 silently rounds up to one extra worker), min_load math, and why max_queue_time / target_queue_time don't matter here (sessions are filtered from wait_time so both signals stay at zero). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:06:04 +01:00
Rob Ballantyne	6a562a1376	Rewrite null pyworker on the framework session model Drop the held-/reserve approach in favour of the framework's session primitive (max_sessions=1 + /session/create). Sessions are excluded from the autoscaler's queue-wait math and don't suffer the cur_perf=0 degradation that a long-held request did, so this naturally produces the "one request comes in and you get a worker; release and it scales back down" model we were hand-rolling. Server side: - max_sessions=1; framework auto-registers /session/* routes - Drop custom /reserve handler, _active_reservation event, max_queue_ time=0.0, MAX_RESERVATION_SECONDS, _perf_heartbeat - Trivial /ping handler exists only to satisfy the framework's "at least one handler with BenchmarkConfig" requirement (and to give clients an extension/keepalive route) - /release on the internal control port is kept as a convenience for queue consumers that don't carry session_auth — calls the framework's __close_session via name-mangling, which bypasses the session_auth check but is fine for a localhost-only endpoint - Workload/perf back to 100 (conventional) Client side: - Uses endpoint.session(cost, lifetime) instead of POST /reserve - async with the SDK Session; close on exit posts /session/end with proper auth → 200 success in metrics - Demo and single modes both ride the same reserve() helper Sessions landed in vastai-sdk 0.4.2 (commit ec9ef59, 2026-01-20). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 10:51:24 +01:00
Rob Ballantyne	6c2f194b28	Add perf heartbeat to keep null pyworker reporting peak throughput While a /reserve is held, no requests complete so workload_served stays at 0 each metrics tick. The autoscaler sees cur_perf=0 against max_perf=150, concludes the worker can't deliver claimed throughput, downgrades it, and gets cautious about scaling up — so additional /reserve requests pile up behind the held one instead of triggering a new worker. Add a 1Hz heartbeat coroutine that, while anything is in flight, sets workload_served back to TARGET_PERF (150) and flags update_pending. The metrics tick reads 150 and resets to 0; the heartbeat re-pins it before the next tick. Net effect: the autoscaler sees a saturated worker delivering at peak rate, which is the signal it needs to scale a new worker up rather than queue. The heartbeat needs the backend instance, which is only created inside Worker(...) — stash a reference in a module-level dict between Worker() and .run() so the lifecycle coroutine can reach it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 10:35:18 +01:00
Rob Ballantyne	2aada7b210	Add --plateau to null pyworker demo (default 5min) Previously the first release fired only 30s after the third reservation started, so the autoscaler often hadn't even finished provisioning the third worker yet. Default plateau to 300s so all three workers are visibly running before scale-down begins; configurable via --plateau. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:26:31 +01:00
Rob Ballantyne	8df562e243	Standardize null pyworker load/perf on 150 Bump workload_calculator, benchmark cache value, and client cost from 100 to 150. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:17:57 +01:00
Rob Ballantyne	4eef5e22af	Pin null pyworker max_throughput to exactly 100 asyncio.sleep(1.0) takes slightly more than 1s due to event loop scheduling, so workload/time landed at ~99.x instead of 100. Pre-populate the framework's .has_benchmark cache file with "100" before the benchmark runs — __run_benchmark short-circuits to the cached value and skips the time-based calculation entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:13:16 +01:00
Rob Ballantyne	9d969e376e	Standardize null pyworker load/perf on 100 Using 1 confused the serverless capacity math. Set workload_calculator, benchmark target throughput, and client cost all to 100 — the conventional default the rest of the system expects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:09:16 +01:00
Rob Ballantyne	ef3f34a515	Restructure null pyworker --demo as a clean trapezoid Three reservations 30s apart, each with a 90s duration. They end one at a time, also 30s apart, then the client exits. Each reservation ends via its duration cap (200 success) rather than the previous "cancel one, leave two open" pattern that left two 499s pending. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:00:46 +01:00
Rob Ballantyne	147bf2597a	Set null pyworker client cost to 1 Match the server-side workload_calculator (1.0) so the autoscaler routing hint is consistent with what the worker reports. A null reservation is a unitless slot — no reason for client cost to be 100. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:47:19 +01:00
Rob Ballantyne	dc423e2999	Pin null pyworker benchmark to ~1.0 throughput The startup benchmark previously returned instantly, producing max_throughput around 339895. A null worker has no real throughput concept (each reservation is a unitless slot), so sleep 1s during the benchmark with workload=1 to record max_throughput ~= 1.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:22:45 +01:00
Rob Ballantyne	463f3de8ea	Add staggered --demo mode to null pyworker client Three concurrent /reserve calls 30s apart, then cancel the first to show the early-release path. The remaining two run until their duration cap. Useful for watching scale-up/scale-down behaviour in the autoscaler dashboard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:08:44 +01:00
Rob Ballantyne	ed0db198c3	Reject queued /reserve immediately on busy null workers A held reservation runs for up to MAX_RESERVATION_SECONDS (default 1h), so queueing a second /reserve behind it makes no sense — the wait would dwarf any sane timeout. Set max_queue_time=0.0 so the framework rejects 429 as soon as another reservation is in flight, and serverless routes the request to a free worker or scales a new one up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:05:02 +01:00
Rob Ballantyne	3668d948be	Simplify null pyworker README intro to serverless terminology Drop the "autoscaler provisions a worker if none is free" phrasing in favor of the simpler "request comes in and you get a worker; release and it scales back down." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:02:41 +01:00
Rob Ballantyne	254ccdf181	Add /release control endpoint to null pyworker The held /reserve now waits on an asyncio.Event and resolves when the local queue consumer POSTs /release on the internal control port (127.0.0.1:18999 by default). This produces a 200 success in metrics instead of the 499 cancellation you got from disconnecting the client. The duration cap stays as a safety net for stuck consumers. The internal aiohttp server is now unconditional and hosts /release always; the stub /health route is added only when BACKEND_HEALTH_URL is unset. NULL_STUB_HEALTH_PORT is renamed to NULL_CONTROL_PORT to reflect the broader role. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 16:59:46 +01:00
Rob Ballantyne	89761b378a	Wire null pyworker healthcheck to a stub (and optional user URL) Adds an in-process aiohttp stub on 127.0.0.1:18999/health so the framework's periodic healthcheck has something live to talk to. Operators can override with BACKEND_HEALTH_URL to point at their queue consumer's /health endpoint, so the autoscaler marks the worker errored if the consumer dies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 16:53:26 +01:00
Rob Ballantyne	18974873e5	Add null pyworker for queue-driven autoscaling A PyWorker that does not forward to any model server. POST /reserve holds the worker busy until the client disconnects (or the duration cap elapses), so users with their own job queue can drive Vast autoscaling without exposing inbound model traffic on the instance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 16:48:52 +01:00