Files
pyworker/README.md
T

152 lines
5.5 KiB
Markdown
Raw Normal View History

2025-12-15 22:33:03 -05:00
# Vast PyWorker Examples
2024-09-04 11:19:30 -07:00
2025-12-15 22:33:03 -05:00
This repository contains **example PyWorkers** used by Vast.ais default Serverless templates (e.g., vLLM, TGI, ComfyUI, Wan, ACE). A PyWorker is a lightweight Python HTTP proxy that runs alongside your model server and:
2024-09-04 11:19:30 -07:00
2025-12-15 22:33:03 -05:00
- Exposes one or more HTTP routes (e.g., `/v1/completions`, `/generate/sync`)
- Optionally validates/transforms request payloads
- Computes per-request **workload** for autoscaling
- Forwards requests to the local model server
- Optionally supports FIFO queueing when the backend cannot process concurrent requests
- Detects readiness/failure from model logs and runs a benchmark to estimate throughput
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
> Important: The **core PyWorker framework** (Worker, WorkerConfig, HandlerConfig, BenchmarkConfig, LogActionConfig) is provided by the **`vastai` / `vastai-sdk`** Python package (https://github.com/vast-ai/vast-sdk). This repo focuses on *worker implementations and examples*, not the framework internals.
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
## Repository Purpose
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
Use this repository as:
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
- A reference for how Vast templates wire up `worker.py`
- A starting point for implementing your own custom Serverless PyWorker
- A collection of working examples for common model backends
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
If you are looking for the framework code itself, refer to the Vast.ai SDK.
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
## Project Structure
2024-09-04 11:19:30 -07:00
2025-12-15 22:33:03 -05:00
Typical layout:
- `workers/`
- Example worker implementations (each worker is usually a self-contained folder)
- Each example typically includes:
- `worker.py` (the entrypoint used by Serverless)
- Optional sample workflows / payloads (for ComfyUI-based workers)
- Optional local test harness scripts
## How Serverless launches worker.py
On each worker instance, the templates startup script typically:
1. Clones your repository from `PYWORKER_REPO`
2. Installs dependencies from `requirements.txt`
3. Starts the **model server** (vLLM, TGI, ComfyUI, etc.)
4. Runs:
```bash
python worker.py
```
Your `worker.py` builds a `WorkerConfig`, constructs a `Worker`, and starts the PyWorker HTTP server.
## worker.py
A PyWorker is usually a single `worker.py` that uses SDK configuration objects:
```python
from vastai import (
Worker,
WorkerConfig,
HandlerConfig,
BenchmarkConfig,
LogActionConfig,
)
worker_config = WorkerConfig(
model_server_url="http://127.0.0.1",
model_server_port=18000,
model_log_file="/var/log/model/server.log",
handlers=[
HandlerConfig(
route="/v1/completions",
allow_parallel_requests=True,
max_queue_time=60.0,
workload_calculator=lambda payload: float(payload.get("max_tokens", 0)),
benchmark_config=BenchmarkConfig(
generator=lambda: {"prompt": "hello", "max_tokens": 128},
runs=8,
concurrency=10,
),
)
],
log_action_config=LogActionConfig(
on_load=["Application startup complete."],
on_error=["Traceback (most recent call last):", "RuntimeError:"],
on_info=['"message":"Download'],
),
)
Worker(worker_config).run()
```
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
## Included Examples
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
This repository contains example PyWorkers corresponding to common Vast templates, including:
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
- **vLLM**: OpenAI-compatible completions/chat endpoints with parallel request support
- **TGI (Text Generation Inference)**: OpenAI-compatible endpoints and log-based readiness
- **ComfyUI (Image / JSON workflows)**: `/generate/sync` for ComfyUI workflow execution
- **ComfyUI Wan 2.2 (T2V)**: ComfyUI workflow execution producing video outputs
- **ComfyUI ACE Step (Text-to-Music)**: ComfyUI workflow execution producing audio outputs
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
Exact worker paths and naming may vary by template; use the `workers/` directory as the source of truth.
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
## Getting Started (Local)
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
1. Install Python dependencies for the examples you plan to run:
```bash
pip install -r requirements.txt
```
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
2. Start your model server locally (vLLM, TGI, ComfyUI, etc.) and ensure:
- You know the model server URL/port
- You have a log file path you can tail for readiness/error detection
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
3. Run the worker:
```bash
python worker.py
```
or, if running an example from a subfolder:
```bash
python workers/<example>/worker.py
```
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
> Note: Many examples assume they are running inside Vast templates (ports, log paths, model locations). You may need to adjust `model_server_port` and `model_log_file` for local usage.
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
## Deploying on Vast Serverless
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
To use a custom PyWorker with Serverless:
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
1. Create a public Git repository containing:
- `worker.py`
- `requirements.txt`
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
2. In your Serverless template / endpoint configuration, set:
- `PYWORKER_REPO` to your Git repository URL
- (Optional) `PYWORKER_REF` to a git ref (branch, tag, or commit)
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
3. The template startup script will clone/install and run your `worker.py`.
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
## Guidance for Custom Workers
When implementing your own worker:
- Define one `HandlerConfig` per route you want to expose.
- Choose a workload function that correlates with compute cost:
- LLMs: prompt tokens + max output tokens (or `max_tokens` as a simpler proxy)
- Non-LLMs: constant cost per request (e.g., `100.0`) is often sufficient
- Set `allow_parallel_requests=False` for backends that cannot handle concurrency (e.g., many ComfyUI deployments).
- Configure exactly **one** `BenchmarkConfig` across all handlers to enable capacity estimation.
- Use `LogActionConfig` to reliably detect “model loaded” and “fatal error” log lines.
## Community & Support
2025-03-26 14:54:15 -07:00
2025-12-15 22:33:03 -05:00
- Vast.ai Discord: https://discord.gg/Pa9M29FFye
- Vast.ai Subreddit: https://reddit.com/r/vastai/