README.md

# Vast PyWorker Examples

This repository contains **example PyWorkers** used by Vast.ai’s default Serverless templates (e.g., vLLM, TGI, ComfyUI, Wan, ACE). A PyWorker is a lightweight Python HTTP proxy that runs alongside your model server and:

- Exposes one or more HTTP routes (e.g., `/v1/completions`, `/generate/sync`)
- Optionally validates/transforms request payloads
- Computes per-request **workload** for autoscaling
- Forwards requests to the local model server
- Optionally supports FIFO queueing when the backend cannot process concurrent requests
- Detects readiness/failure from model logs and runs a benchmark to estimate throughput

> Important: The **core PyWorker framework** (Worker, WorkerConfig, HandlerConfig, BenchmarkConfig, LogActionConfig) is provided by the **`vastai` / `vastai-sdk`** Python package (https://github.com/vast-ai/vast-sdk). This repo focuses on *worker implementations and examples*, not the framework internals.

## Repository Purpose

Use this repository as:

- A reference for how Vast templates wire up `worker.py`
- A starting point for implementing your own custom Serverless PyWorker
- A collection of working examples for common model backends

If you are looking for the framework code itself, refer to the Vast.ai SDK.

## Project Structure

Typical layout:

- `workers/`
  - Example worker implementations (each worker is usually a self-contained folder)
  - Each example typically includes:
    - `worker.py` (the entrypoint used by Serverless)
    - Optional sample workflows / payloads (for ComfyUI-based workers)
    - Optional local test harness scripts

## How Serverless launches worker.py

On each worker instance, the template’s startup script typically:

1. Clones your repository from `PYWORKER_REPO`
2. Installs dependencies from `requirements.txt`
3. Starts the **model server** (vLLM, TGI, ComfyUI, etc.)
4. Runs:
   ```bash
   python worker.py
   ```

Your `worker.py` builds a `WorkerConfig`, constructs a `Worker`, and starts the PyWorker HTTP server.

## worker.py

A PyWorker is usually a single `worker.py` that uses SDK configuration objects:

```python
from vastai import (
    Worker,
    WorkerConfig,
    HandlerConfig,
    BenchmarkConfig,
    LogActionConfig,
)

worker_config = WorkerConfig(
    model_server_url="http://127.0.0.1",
    model_server_port=18000,
    model_log_file="/var/log/model/server.log",
    handlers=[
        HandlerConfig(
            route="/v1/completions",
            allow_parallel_requests=True,
            max_queue_time=60.0,
            workload_calculator=lambda payload: float(payload.get("max_tokens", 0)),
            benchmark_config=BenchmarkConfig(
                generator=lambda: {"prompt": "hello", "max_tokens": 128},
                runs=8,
                concurrency=10,
            ),
        )
    ],
    log_action_config=LogActionConfig(
        on_load=["Application startup complete."],
        on_error=["Traceback (most recent call last):", "RuntimeError:"],
        on_info=['"message":"Download'],
    ),
)

Worker(worker_config).run()
```

## Included Examples

This repository contains example PyWorkers corresponding to common Vast templates, including:

- **vLLM**: OpenAI-compatible completions/chat endpoints with parallel request support
- **TGI (Text Generation Inference)**: OpenAI-compatible endpoints and log-based readiness
- **ComfyUI (Image / JSON workflows)**: `/generate/sync` for ComfyUI workflow execution
- **ComfyUI Wan 2.2 (T2V)**: ComfyUI workflow execution producing video outputs
- **ComfyUI ACE Step (Text-to-Music)**: ComfyUI workflow execution producing audio outputs

Exact worker paths and naming may vary by template; use the `workers/` directory as the source of truth.

## Getting Started (Local)

1. Install Python dependencies for the examples you plan to run:
   ```bash
   pip install -r requirements.txt
   ```

2. Start your model server locally (vLLM, TGI, ComfyUI, etc.) and ensure:
   - You know the model server URL/port
   - You have a log file path you can tail for readiness/error detection

3. Run the worker:
   ```bash
   python worker.py
   ```
   or, if running an example from a subfolder:
   ```bash
   python workers/<example>/worker.py
   ```

> Note: Many examples assume they are running inside Vast templates (ports, log paths, model locations). You may need to adjust `model_server_port` and `model_log_file` for local usage.

## Deploying on Vast Serverless

To use a custom PyWorker with Serverless:

1. Create a public Git repository containing:
   - `worker.py`
   - `requirements.txt`

2. In your Serverless template / endpoint configuration, set:
   - `PYWORKER_REPO` to your Git repository URL
   - (Optional) `PYWORKER_REF` to a git ref (branch, tag, or commit)

3. The template startup script will clone/install and run your `worker.py`.

## Guidance for Custom Workers

When implementing your own worker:

- Define one `HandlerConfig` per route you want to expose.
- Choose a workload function that correlates with compute cost:
  - LLMs: prompt tokens + max output tokens (or `max_tokens` as a simpler proxy)
  - Non-LLMs: constant cost per request (e.g., `100.0`) is often sufficient
- Set `allow_parallel_requests=False` for backends that cannot handle concurrency (e.g., many ComfyUI deployments).
- Configure exactly **one** `BenchmarkConfig` across all handlers to enable capacity estimation.
- Use `LogActionConfig` to reliably detect “model loaded” and “fatal error” log lines.

## Community & Support

- Vast.ai Discord: https://discord.gg/Pa9M29FFye
- Vast.ai Subreddit: https://reddit.com/r/vastai/
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								# Vast PyWorker Examples
-											initial commit
										
										
											2024-09-04 11:19:30 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								This repository contains **example PyWorkers** used by Vast.ai’s default Serverless templates (e.g., vLLM, TGI, ComfyUI, Wan, ACE). A PyWorker is a lightweight Python HTTP proxy that runs alongside your model server and:
-											initial commit
										
										
											2024-09-04 11:19:30 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								- Exposes one or more HTTP routes (e.g., `/v1/completions`, `/generate/sync`)
 								- Optionally validates/transforms request payloads
 								- Computes per-request **workload** for autoscaling
 								- Forwards requests to the local model server
 								- Optionally supports FIFO queueing when the backend cannot process concurrent requests
 								- Detects readiness/failure from model logs and runs a benchmark to estimate throughput
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								> Important: The **core PyWorker framework** (Worker, WorkerConfig, HandlerConfig, BenchmarkConfig, LogActionConfig) is provided by the **`vastai` / `vastai-sdk`** Python package (https://github.com/vast-ai/vast-sdk). This repo focuses on *worker implementations and examples*, not the framework internals.
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								## Repository Purpose
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								Use this repository as:
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								- A reference for how Vast templates wire up `worker.py`
 								- A starting point for implementing your own custom Serverless PyWorker
 								- A collection of working examples for common model backends
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								If you are looking for the framework code itself, refer to the Vast.ai SDK.
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								## Project Structure
-											initial commit
										
										
											2024-09-04 11:19:30 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								Typical layout:
 								- `workers/`
 								  - Example worker implementations (each worker is usually a self-contained folder)
 								  - Each example typically includes:
 								    - `worker.py` (the entrypoint used by Serverless)
 								    - Optional sample workflows / payloads (for ComfyUI-based workers)
 								    - Optional local test harness scripts
 								## How Serverless launches worker.py
 								On each worker instance, the template’s startup script typically:
 . Clones your repository from `PYWORKER_REPO`
 . Installs dependencies from `requirements.txt`
 . Starts the **model server** (vLLM, TGI, ComfyUI, etc.)
 . Runs:
 								   ```bash
 								   python worker.py
 								   ```
 								Your `worker.py` builds a `WorkerConfig`, constructs a `Worker`, and starts the PyWorker HTTP server.
 								## worker.py
 								A PyWorker is usually a single `worker.py` that uses SDK configuration objects:
 								```python
 								from vastai import (
 								    Worker,
 								    WorkerConfig,
 								    HandlerConfig,
 								    BenchmarkConfig,
 								    LogActionConfig,
 								)
 								worker_config = WorkerConfig(
 								    model_server_url="http://127.0.0.1",
 								    model_server_port=18000,
 								    model_log_file="/var/log/model/server.log",
 								    handlers=[
 								        HandlerConfig(
 								            route="/v1/completions",
 								            allow_parallel_requests=True,
 								            max_queue_time=60.0,
 								            workload_calculator=lambda payload: float(payload.get("max_tokens", 0)),
 								            benchmark_config=BenchmarkConfig(
 								                generator=lambda: {"prompt": "hello", "max_tokens": 128},
 								                runs=8,
 								                concurrency=10,
 								            ),
 								        )
 								    ],
 								    log_action_config=LogActionConfig(
 								        on_load=["Application startup complete."],
 								        on_error=["Traceback (most recent call last):", "RuntimeError:"],
 								        on_info=['"message":"Download'],
 								    ),
 								)
 								Worker(worker_config).run()
 								```
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								## Included Examples
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								This repository contains example PyWorkers corresponding to common Vast templates, including:
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								- **vLLM**: OpenAI-compatible completions/chat endpoints with parallel request support
 								- **TGI (Text Generation Inference)**: OpenAI-compatible endpoints and log-based readiness
 								- **ComfyUI (Image / JSON workflows)**: `/generate/sync` for ComfyUI workflow execution
 								- **ComfyUI Wan 2.2 (T2V)**: ComfyUI workflow execution producing video outputs
 								- **ComfyUI ACE Step (Text-to-Music)**: ComfyUI workflow execution producing audio outputs
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								Exact worker paths and naming may vary by template; use the `workers/` directory as the source of truth.
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								## Getting Started (Local)
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+. Install Python dependencies for the examples you plan to run:
 								   ```bash
 								   pip install -r requirements.txt
 								   ```
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+. Start your model server locally (vLLM, TGI, ComfyUI, etc.) and ensure:
 								   - You know the model server URL/port
 								   - You have a log file path you can tail for readiness/error detection
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+. Run the worker:
 								   ```bash
 								   python worker.py
 								   ```
 								   or, if running an example from a subfolder:
 								   ```bash
 								   python workers/<example>/worker.py
 								   ```
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								> Note: Many examples assume they are running inside Vast templates (ports, log paths, model locations). You may need to adjust `model_server_port` and `model_log_file` for local usage.
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								## Deploying on Vast Serverless
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								To use a custom PyWorker with Serverless:
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+. Create a public Git repository containing:
 								   - `worker.py`
 								   - `requirements.txt`
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+. In your Serverless template / endpoint configuration, set:
 								   - `PYWORKER_REPO` to your Git repository URL
 								   - (Optional) `PYWORKER_REF` to a git ref (branch, tag, or commit)
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+. The template startup script will clone/install and run your `worker.py`.
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								## Guidance for Custom Workers
 								When implementing your own worker:
 								- Define one `HandlerConfig` per route you want to expose.
 								- Choose a workload function that correlates with compute cost:
 								  - LLMs: prompt tokens + max output tokens (or `max_tokens` as a simpler proxy)
 								  - Non-LLMs: constant cost per request (e.g., `100.0`) is often sufficient
 								- Set `allow_parallel_requests=False` for backends that cannot handle concurrency (e.g., many ComfyUI deployments).
 								- Configure exactly **one** `BenchmarkConfig` across all handlers to enable capacity estimation.
 								- Use `LogActionConfig` to reliably detect “model loaded” and “fatal error” log lines.
 								## Community & Support
-											updating the readme
										
										
											2025-03-26 14:54:15 -07:00
-											Use PyWorker SDK (#67)
										
										
											2025-12-15 22:33:03 -05:00
+								- Vast.ai Discord: https://discord.gg/Pa9M29FFye
 								- Vast.ai Subreddit: https://reddit.com/r/vastai/