Files
pyworker/README.md
T
LucasArmandVast 4380d98c01 Use PyWorker SDK (#67)
* Change PyWorker to Worker SDK
* Moved /lib to vast-sdk (https://github.com/vast-ai/vast-sdk)
2025-12-15 19:33:03 -08:00

152 lines
5.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Vast PyWorker Examples
This repository contains **example PyWorkers** used by Vast.ais default Serverless templates (e.g., vLLM, TGI, ComfyUI, Wan, ACE). A PyWorker is a lightweight Python HTTP proxy that runs alongside your model server and:
- Exposes one or more HTTP routes (e.g., `/v1/completions`, `/generate/sync`)
- Optionally validates/transforms request payloads
- Computes per-request **workload** for autoscaling
- Forwards requests to the local model server
- Optionally supports FIFO queueing when the backend cannot process concurrent requests
- Detects readiness/failure from model logs and runs a benchmark to estimate throughput
> Important: The **core PyWorker framework** (Worker, WorkerConfig, HandlerConfig, BenchmarkConfig, LogActionConfig) is provided by the **`vastai` / `vastai-sdk`** Python package (https://github.com/vast-ai/vast-sdk). This repo focuses on *worker implementations and examples*, not the framework internals.
## Repository Purpose
Use this repository as:
- A reference for how Vast templates wire up `worker.py`
- A starting point for implementing your own custom Serverless PyWorker
- A collection of working examples for common model backends
If you are looking for the framework code itself, refer to the Vast.ai SDK.
## Project Structure
Typical layout:
- `workers/`
- Example worker implementations (each worker is usually a self-contained folder)
- Each example typically includes:
- `worker.py` (the entrypoint used by Serverless)
- Optional sample workflows / payloads (for ComfyUI-based workers)
- Optional local test harness scripts
## How Serverless launches worker.py
On each worker instance, the templates startup script typically:
1. Clones your repository from `PYWORKER_REPO`
2. Installs dependencies from `requirements.txt`
3. Starts the **model server** (vLLM, TGI, ComfyUI, etc.)
4. Runs:
```bash
python worker.py
```
Your `worker.py` builds a `WorkerConfig`, constructs a `Worker`, and starts the PyWorker HTTP server.
## worker.py
A PyWorker is usually a single `worker.py` that uses SDK configuration objects:
```python
from vastai import (
Worker,
WorkerConfig,
HandlerConfig,
BenchmarkConfig,
LogActionConfig,
)
worker_config = WorkerConfig(
model_server_url="http://127.0.0.1",
model_server_port=18000,
model_log_file="/var/log/model/server.log",
handlers=[
HandlerConfig(
route="/v1/completions",
allow_parallel_requests=True,
max_queue_time=60.0,
workload_calculator=lambda payload: float(payload.get("max_tokens", 0)),
benchmark_config=BenchmarkConfig(
generator=lambda: {"prompt": "hello", "max_tokens": 128},
runs=8,
concurrency=10,
),
)
],
log_action_config=LogActionConfig(
on_load=["Application startup complete."],
on_error=["Traceback (most recent call last):", "RuntimeError:"],
on_info=['"message":"Download'],
),
)
Worker(worker_config).run()
```
## Included Examples
This repository contains example PyWorkers corresponding to common Vast templates, including:
- **vLLM**: OpenAI-compatible completions/chat endpoints with parallel request support
- **TGI (Text Generation Inference)**: OpenAI-compatible endpoints and log-based readiness
- **ComfyUI (Image / JSON workflows)**: `/generate/sync` for ComfyUI workflow execution
- **ComfyUI Wan 2.2 (T2V)**: ComfyUI workflow execution producing video outputs
- **ComfyUI ACE Step (Text-to-Music)**: ComfyUI workflow execution producing audio outputs
Exact worker paths and naming may vary by template; use the `workers/` directory as the source of truth.
## Getting Started (Local)
1. Install Python dependencies for the examples you plan to run:
```bash
pip install -r requirements.txt
```
2. Start your model server locally (vLLM, TGI, ComfyUI, etc.) and ensure:
- You know the model server URL/port
- You have a log file path you can tail for readiness/error detection
3. Run the worker:
```bash
python worker.py
```
or, if running an example from a subfolder:
```bash
python workers/<example>/worker.py
```
> Note: Many examples assume they are running inside Vast templates (ports, log paths, model locations). You may need to adjust `model_server_port` and `model_log_file` for local usage.
## Deploying on Vast Serverless
To use a custom PyWorker with Serverless:
1. Create a public Git repository containing:
- `worker.py`
- `requirements.txt`
2. In your Serverless template / endpoint configuration, set:
- `PYWORKER_REPO` to your Git repository URL
- (Optional) `PYWORKER_REF` to a git ref (branch, tag, or commit)
3. The template startup script will clone/install and run your `worker.py`.
## Guidance for Custom Workers
When implementing your own worker:
- Define one `HandlerConfig` per route you want to expose.
- Choose a workload function that correlates with compute cost:
- LLMs: prompt tokens + max output tokens (or `max_tokens` as a simpler proxy)
- Non-LLMs: constant cost per request (e.g., `100.0`) is often sufficient
- Set `allow_parallel_requests=False` for backends that cannot handle concurrency (e.g., many ComfyUI deployments).
- Configure exactly **one** `BenchmarkConfig` across all handlers to enable capacity estimation.
- Use `LogActionConfig` to reliably detect “model loaded” and “fatal error” log lines.
## Community & Support
- Vast.ai Discord: https://discord.gg/Pa9M29FFye
- Vast.ai Subreddit: https://reddit.com/r/vastai/