Three reservations 30s apart, each with a 90s duration. They end one at a time, also 30s apart, then the client exits. Each reservation ends via its duration cap (200 success) rather than the previous "cancel one, leave two open" pattern that left two 499s pending. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vast PyWorker Examples
This repository contains example PyWorkers used by Vast.ai’s default Serverless templates (e.g., vLLM, TGI, ComfyUI, Wan, ACE). A PyWorker is a lightweight Python HTTP proxy that runs alongside your model server and:
- Exposes one or more HTTP routes (e.g.,
/v1/completions,/generate/sync) - Optionally validates/transforms request payloads
- Computes per-request workload for autoscaling
- Forwards requests to the local model server
- Optionally supports FIFO queueing when the backend cannot process concurrent requests
- Detects readiness/failure from model logs and runs a benchmark to estimate throughput
Important: The core PyWorker framework (Worker, WorkerConfig, HandlerConfig, BenchmarkConfig, LogActionConfig) is provided by the
vastai/vastai-sdkPython package (https://github.com/vast-ai/vast-sdk). This repo focuses on worker implementations and examples, not the framework internals.
Repository Purpose
Use this repository as:
- A reference for how Vast templates wire up
worker.py - A starting point for implementing your own custom Serverless PyWorker
- A collection of working examples for common model backends
If you are looking for the framework code itself, refer to the Vast.ai SDK.
Project Structure
Typical layout:
workers/- Example worker implementations (each worker is usually a self-contained folder)
- Each example typically includes:
worker.py(the entrypoint used by Serverless)- Optional sample workflows / payloads (for ComfyUI-based workers)
- Optional local test harness scripts
How Serverless launches worker.py
On each worker instance, the template’s startup script typically:
- Clones your repository from
PYWORKER_REPO - Installs dependencies from
requirements.txt - Starts the model server (vLLM, TGI, ComfyUI, etc.)
- Runs:
python worker.py
Your worker.py builds a WorkerConfig, constructs a Worker, and starts the PyWorker HTTP server.
worker.py
A PyWorker is usually a single worker.py that uses SDK configuration objects:
from vastai import (
Worker,
WorkerConfig,
HandlerConfig,
BenchmarkConfig,
LogActionConfig,
)
worker_config = WorkerConfig(
model_server_url="http://127.0.0.1",
model_server_port=18000,
model_log_file="/var/log/model/server.log",
handlers=[
HandlerConfig(
route="/v1/completions",
allow_parallel_requests=True,
max_queue_time=60.0,
workload_calculator=lambda payload: float(payload.get("max_tokens", 0)),
benchmark_config=BenchmarkConfig(
generator=lambda: {"prompt": "hello", "max_tokens": 128},
runs=8,
concurrency=10,
),
)
],
log_action_config=LogActionConfig(
on_load=["Application startup complete."],
on_error=["Traceback (most recent call last):", "RuntimeError:"],
on_info=['"message":"Download'],
),
)
Worker(worker_config).run()
Included Examples
This repository contains example PyWorkers corresponding to common Vast templates, including:
- vLLM: OpenAI-compatible completions/chat endpoints with parallel request support
- TGI (Text Generation Inference): OpenAI-compatible endpoints and log-based readiness
- ComfyUI (Image / JSON workflows):
/generate/syncfor ComfyUI workflow execution - ComfyUI Wan 2.2 (T2V): ComfyUI workflow execution producing video outputs
- ComfyUI ACE Step (Text-to-Music): ComfyUI workflow execution producing audio outputs
Exact worker paths and naming may vary by template; use the workers/ directory as the source of truth.
Getting Started (Local)
-
Install Python dependencies for the examples you plan to run:
pip install -r requirements.txt -
Start your model server locally (vLLM, TGI, ComfyUI, etc.) and ensure:
- You know the model server URL/port
- You have a log file path you can tail for readiness/error detection
-
Run the worker:
python worker.pyor, if running an example from a subfolder:
python workers/<example>/worker.py
Note: Many examples assume they are running inside Vast templates (ports, log paths, model locations). You may need to adjust
model_server_portandmodel_log_filefor local usage.
Deploying on Vast Serverless
To use a custom PyWorker with Serverless:
-
Create a public Git repository containing:
worker.pyrequirements.txt
-
In your Serverless template / endpoint configuration, set:
PYWORKER_REPOto your Git repository URL- (Optional)
PYWORKER_REFto a git ref (branch, tag, or commit)
-
The template startup script will clone/install and run your
worker.py.
Guidance for Custom Workers
When implementing your own worker:
- Define one
HandlerConfigper route you want to expose. - Choose a workload function that correlates with compute cost:
- LLMs: prompt tokens + max output tokens (or
max_tokensas a simpler proxy) - Non-LLMs: constant cost per request (e.g.,
100.0) is often sufficient
- LLMs: prompt tokens + max output tokens (or
- Set
allow_parallel_requests=Falsefor backends that cannot handle concurrency (e.g., many ComfyUI deployments). - Configure exactly one
BenchmarkConfigacross all handlers to enable capacity estimation. - Use
LogActionConfigto reliably detect “model loaded” and “fatal error” log lines.
Community & Support
- Vast.ai Discord: https://discord.gg/Pa9M29FFye
- Vast.ai Subreddit: https://reddit.com/r/vastai/