Added clients, updated READMEs

2025-12-12 10:41:21 -08:00
parent 6060f8ce0c
commit 4d99c12820
9 changed files with 827 additions and 199 deletions
@@ -1,90 +1,152 @@
-# Vast PyWorker
+# Vast PyWorker Examples

-Vast PyWorker is a Python web server designed to run alongside a LLM or image generation models running on vast,
-enabling autoscaler integration.
-It serves as the primary entry point for API requests, forwarding them to the model's API hosted on the
-same instance. Additionally, it monitors performance metrics and estimates current workload based on factors
-such as the number of tokens processed for LLMs or image resolution and steps for image generation models,
-reporting these metrics to the autoscaler.
+This repository contains **example PyWorkers** used by Vast.ai’s default Serverless templates (e.g., vLLM, TGI, ComfyUI, Wan, ACE). A PyWorker is a lightweight Python HTTP proxy that runs alongside your model server and:
+
+- Exposes one or more HTTP routes (e.g., `/v1/completions`, `/generate/sync`)
+- Optionally validates/transforms request payloads
+- Computes per-request **workload** for autoscaling
+- Forwards requests to the local model server
+- Optionally supports FIFO queueing when the backend cannot process concurrent requests
+- Detects readiness/failure from model logs and runs a benchmark to estimate throughput
+
+> Important: The **core PyWorker framework** (Worker, WorkerConfig, HandlerConfig, BenchmarkConfig, LogActionConfig) is provided by the **`vastai` / `vastai-sdk`** Python package (https://github.com/vast-ai/vast-sdk). This repo focuses on *worker implementations and examples*, not the framework internals.
+
+## Repository Purpose
+
+Use this repository as:
+
+- A reference for how Vast templates wire up `worker.py`
+- A starting point for implementing your own custom Serverless PyWorker
+- A collection of working examples for common model backends
+
+If you are looking for the framework code itself, refer to the Vast.ai SDK.

 ## Project Structure

-*   `lib/`: Contains the core PyWorker framework code (server logic, data types, metrics).
-*   `workers/`: Contains specific implementations (PyWorkers) for different model servers. Each subdirectory represents a worker for a particular model type.
+Typical layout:

-## Getting Started
+- `workers/`
+  - Example worker implementations (each worker is usually a self-contained folder)
+  - Each example typically includes:
+    - `worker.py` (the entrypoint used by Serverless)
+    - Optional sample workflows / payloads (for ComfyUI-based workers)
+    - Optional local test harness scripts

-1.  **Install Dependencies:**
-    ```bash
-    pip install -r requirements.txt
-    ```
-    You may also need `pyright` for type checking:
-    ```bash
-    sudo npm install -g pyright
-    # or use your preferred method to install pyright
-    ```
+## How Serverless launches worker.py

-2.  **Configure Environment:** Set any necessary environment variables (e.g., `MODEL_LOG` path, API keys if needed by your worker).
+On each worker instance, the template’s startup script typically:

-3.  **Run the Server:** Use the provided script. You'll need to specify which worker to run.
-    ```bash
-    # Example for hello_world worker (assuming MODEL_LOG is set)
-    ./start_server.sh workers.hello_world.server
-    ```
-    Replace `workers.hello_world.server` with the path to the `server.py` module of the worker you want to run.
+1. Clones your repository from `PYWORKER_REPO`
+2. Installs dependencies from `requirements.txt`
+3. Starts the **model server** (vLLM, TGI, ComfyUI, etc.)
+4. Runs:
+   ```bash
+   python worker.py
+   ```

-## How to Use
+Your `worker.py` builds a `WorkerConfig`, constructs a `Worker`, and starts the PyWorker HTTP server.

-### Using Existing Workers
+## worker.py

-If you are using a Vast.ai template that includes PyWorker integration (marked as autoscaler compatible), it should work out of the box. The template will typically start the appropriate PyWorker server automatically. Here's a few:
+A PyWorker is usually a single `worker.py` that uses SDK configuration objects:

-*   **vLLM:** [Vast.ai Template](https://cloud.vast.ai?ref_id=62897&template_id=63ae93902bf3978bea033782592b784d)
-*   **TGI (Text Generation Inference):** [Vast.ai Template](https://cloud.vast.ai?ref_id=62897&template_id=6fa6bd5bdf5f0df63db80e40b086037d)
-*   **ComfyUI:** [Vast.ai Template](https://cloud.vast.ai?ref_id=62897&template_id=e6748878ba688e765e3e9fca29541938)
+```python
+from vastai import (
+    Worker,
+    WorkerConfig,
+    HandlerConfig,
+    BenchmarkConfig,
+    LogActionConfig,
+)

-Currently available workers:
-*   `openai`: A simple example worker for a basic vLLM server.
-*   `comfyui`: A worker for the ComfyUI image generation backend.
-*   `tgi`: A worker for the Text Generation Inference backend.
+worker_config = WorkerConfig(
+    model_server_url="http://127.0.0.1",
+    model_server_port=18000,
+    model_log_file="/var/log/model/server.log",
+    handlers=[
+        HandlerConfig(
+            route="/v1/completions",
+            allow_parallel_requests=True,
+            max_queue_time=60.0,
+            workload_calculator=lambda payload: float(payload.get("max_tokens", 0)),
+            benchmark_config=BenchmarkConfig(
+                generator=lambda: {"prompt": "hello", "max_tokens": 128},
+                runs=8,
+                concurrency=10,
+            ),
+        )
+    ],
+    log_action_config=LogActionConfig(
+        on_load=["Application startup complete."],
+        on_error=["Traceback (most recent call last):", "RuntimeError:"],
+        on_info=['"message":"Download'],
+    ),
+)

-### Implementing a New Worker
-
-To integrate PyWorker with a model server not already supported, you need to create a new worker implementation under the `workers/` directory. Follow these general steps:
-
-1.  **Create Worker Directory:** Add a new directory under `workers/` (e.g., `workers/my_model/`).
-2.  **Define Data Types (`data_types.py`):**
-    *   Create a class inheriting from `lib.data_types.ApiPayload`.
-    *   Implement methods like `for_test`, `generate_payload_json`, `count_workload`, and `from_json_msg` to handle request data, testing, and workload calculation specific to your model's API.
-3.  **Implement Endpoint Handlers (`server.py`):**
-    *   For each model API endpoint you want PyWorker to proxy, create a class inheriting from `lib.data_types.EndpointHandler`.
-    *   Implement methods like `endpoint`, `payload_cls`, `generate_payload_json`, `make_benchmark_payload` (for one handler), and `generate_client_response`.
-    *   Instantiate `lib.backend.Backend` with your model server details, log file path, benchmark handler, and log actions.
-    *   Define `aiohttp` routes, mapping paths to your handlers using `backend.create_handler()`.
-    *   Use `lib.server.start_server` to run the application.
-4.  **Add `__init__.py`:** Create an empty `__init__.py` file in your worker directory.
-5.  **(Optional) Add Load Testing (`test_load.py`):** Create a script using `lib.test_harness.run` to test your worker against a Vast.ai endpoint group.
-6.  **(Optional) Add Client Example (`client.py`):** Provide a script demonstrating how to call your worker's endpoints.
-
-**For a detailed walkthrough, refer to the `hello_world` example:** [workers/hello_world/README.md](workers/hello_world/README.md)
-
-
-**Type Hinting:** It is strongly recommended to use strict type hinting throughout your implementation. Use `pyright` to check for type errors.
-
-## Testing Your Worker
-
-If you implement a `test_load.py` script for your worker, you can use it to load test a Vast.ai endpoint group running your instance image.
-
-```bash
-# Example for hello_world worker
-python3 -m workers.hello_world.test_load -n 1000 -rps 0.5 -k "$API_KEY" -e "$ENDPOINT_GROUP_NAME"
+Worker(worker_config).run()
 ```

-Replace `workers.hello_world.test_load` with the path to your worker's test script and provide your Vast.ai API Key (`-k`) and the target Endpoint Group Name (`-e`). Adjust the number of requests (`-n`) and requests per second (`-rps`) as needed.
+## Included Examples
+
+This repository contains example PyWorkers corresponding to common Vast templates, including:
+
+- **vLLM**: OpenAI-compatible completions/chat endpoints with parallel request support
+- **TGI (Text Generation Inference)**: OpenAI-compatible endpoints and log-based readiness
+- **ComfyUI (Image / JSON workflows)**: `/generate/sync` for ComfyUI workflow execution
+- **ComfyUI Wan 2.2 (T2V)**: ComfyUI workflow execution producing video outputs
+- **ComfyUI ACE Step (Text-to-Music)**: ComfyUI workflow execution producing audio outputs
+
+Exact worker paths and naming may vary by template; use the `workers/` directory as the source of truth.
+
+## Getting Started (Local)
+
+1. Install Python dependencies for the examples you plan to run:
+   ```bash
+   pip install -r requirements.txt
+   ```
+
+2. Start your model server locally (vLLM, TGI, ComfyUI, etc.) and ensure:
+   - You know the model server URL/port
+   - You have a log file path you can tail for readiness/error detection
+
+3. Run the worker:
+   ```bash
+   python worker.py
+   ```
+   or, if running an example from a subfolder:
+   ```bash
+   python workers/<example>/worker.py
+   ```
+
+> Note: Many examples assume they are running inside Vast templates (ports, log paths, model locations). You may need to adjust `model_server_port` and `model_log_file` for local usage.
+
+## Deploying on Vast Serverless
+
+To use a custom PyWorker with Serverless:
+
+1. Create a public Git repository containing:
+   - `worker.py`
+   - `requirements.txt`
+
+2. In your Serverless template / endpoint configuration, set:
+   - `PYWORKER_REPO` to your Git repository URL
+   - (Optional) `PYWORKER_REF` to a git ref (branch, tag, or commit)
+
+3. The template startup script will clone/install and run your `worker.py`.
+
+## Guidance for Custom Workers
+
+When implementing your own worker:
+
+- Define one `HandlerConfig` per route you want to expose.
+- Choose a workload function that correlates with compute cost:
+  - LLMs: prompt tokens + max output tokens (or `max_tokens` as a simpler proxy)
+  - Non-LLMs: constant cost per request (e.g., `100.0`) is often sufficient
+- Set `allow_parallel_requests=False` for backends that cannot handle concurrency (e.g., many ComfyUI deployments).
+- Configure exactly **one** `BenchmarkConfig` across all handlers to enable capacity estimation.
+- Use `LogActionConfig` to reliably detect “model loaded” and “fatal error” log lines.

 ## Community & Support

-Join the conversation and get help:
-
-*   **Vast.ai Discord:** [https://discord.gg/Pa9M29FFye](https://discord.gg/Pa9M29FFye)
-*   **Vast.ai Subreddit:** [https://reddit.com/r/vastai/](https://reddit.com/r/vastai/)
+- Vast.ai Discord: https://discord.gg/Pa9M29FFye
+- Vast.ai Subreddit: https://reddit.com/r/vastai/
@@ -0,0 +1,168 @@
+# ComfyUI ACE Step PyWorker
+
+This is the PyWorker implementation for running **ACE Step v1 3.5B** text-to-music workflows in ComfyUI. It provides a unified interface for executing complete ComfyUI audio-generation workflows through a proxy-based architecture and returning generated audio assets.
+
+Each request has a static cost of `100`. ComfyUI does not support concurrent workloads, and there is no provision to run multiple ComfyUI instances per worker node.
+
+## Requirements
+
+This worker requires the following components:
+
+- ComfyUI (https://github.com/comfyanonymous/ComfyUI)
+- ComfyUI API Wrapper (https://github.com/ai-dock/comfyui-api-wrapper)
+- ACE Step v1 3.5B model and required custom nodes
+
+A Docker image is provided with the ACE Step model pre-installed, but any image may be used if the above requirements are met.
+
+## Endpoint
+
+The worker exposes a single synchronous endpoint:
+
+- `/generate/sync`: Processes a complete ComfyUI workflow JSON and generates audio output
+
+## Request Format
+
+The ACE Step worker **only supports custom workflow mode**. Modifier-based workflows are not supported.
+
+```json
+{
+  "input": {
+    "request_id": "uuid-string",
+    "workflow_json": {
+      // Complete ComfyUI ACE Step workflow JSON
+    },
+    "s3": { },
+    "webhook": { }
+  }
+}
+```
+
+## Request Fields
+
+### Required Fields
+
+- `input`: Container for all request parameters
+- `input.workflow_json`: Complete ComfyUI workflow graph for ACE Step audio generation
+
+### Optional Fields
+
+- `input.request_id`: Client-defined request identifier
+- `input.s3`: S3-compatible storage configuration
+- `input.webhook`: Webhook configuration for completion notifications
+
+The special string `"__RANDOM_INT__"` may be used in the workflow JSON and will be replaced with a random integer before submission to ComfyUI.
+
+## S3 Configuration
+
+Generated audio assets can be automatically uploaded to S3-compatible storage. Configuration can be supplied per request or via environment variables. Request-level values take precedence.
+
+### Via Request JSON
+
+```json
+"s3": {
+  "access_key_id": "your-s3-access-key",
+  "secret_access_key": "your-s3-secret-access-key",
+  "endpoint_url": "https://s3.amazonaws.com",
+  "bucket_name": "your-bucket",
+  "region": "us-east-1"
+}
+```
+
+### Via Environment Variables
+
+```bash
+S3_ACCESS_KEY_ID=your-key
+S3_SECRET_ACCESS_KEY=your-secret
+S3_BUCKET_NAME=your-bucket
+S3_ENDPOINT_URL=https://s3.amazonaws.com
+S3_REGION=us-east-1
+```
+
+## Webhook Configuration
+
+Webhooks are triggered on request completion or failure.
+
+### Via Request JSON
+
+```json
+"webhook": {
+  "url": "https://your-webhook-url",
+  "extra_params": {
+    "custom_field": "value"
+  }
+}
+```
+
+### Via Environment Variables
+
+```bash
+WEBHOOK_URL=https://your-webhook-url
+WEBHOOK_TIMEOUT=30
+```
+
+## Example Request
+
+### ACE Step Text-to-Music Workflow
+
+```json
+{
+  "input": {
+    "workflow_json": {
+      "14": {
+        "inputs": {
+          "tags": "funk, pop, upbeat, 105 BPM",
+          "lyrics": "Turn it up and let it flow",
+          "lyrics_strength": 0.99,
+          "clip": ["40", 1]
+        },
+        "class_type": "TextEncodeAceStepAudio"
+      },
+      "17": {
+        "inputs": {
+          "seconds": 180,
+          "batch_size": 1
+        },
+        "class_type": "EmptyAceStepLatentAudio"
+      },
+      "40": {
+        "inputs": {
+          "ckpt_name": "ace_step_v1_3.5b.safetensors"
+        },
+        "class_type": "CheckpointLoaderSimple"
+      }
+    }
+  }
+}
+```
+
+## Response Format
+
+A successful response includes execution metadata, ComfyUI output details, and generated audio assets.
+
+### Response Fields
+
+- `id`: Unique request identifier
+- `status`: `completed`, `failed`, `processing`, `generating`, or `queued`
+- `message`: Human-readable status message
+- `comfyui_response`: Raw response from ComfyUI, including execution status and progress
+- `output`: Array of generated outputs
+- `timings`: Timing information for the request
+
+### Output Object
+
+Each entry in `output` includes:
+
+- `filename`: Generated file name (e.g., `.mp3`)
+- `local_path`: File path on the worker
+- `url`: Pre-signed download URL (if S3 is configured)
+- `type`: Output type (`output`)
+- `subfolder`: Output directory (e.g., `audio`)
+- `node_id`: ComfyUI node that produced the output
+- `output_type`: Output category (e.g., `audio`)
+
+## Notes and Limitations
+
+- Only full ComfyUI workflow JSONs are supported
+- Concurrent requests are not supported per worker
+- ACE Step model must be installed before processing requests
+- Audio generation duration and runtime depend on workflow configuration
@@ -0,0 +1,149 @@
+from vastai import Serverless
+import asyncio
+
+
+async def main():
+    async with Serverless() as client:
+        endpoint = await client.get_endpoint(name="my-ace-endpoint")
+
+        # ComfyUI API compatible json workflow for ACE Step
+        workflow = {
+          "14": {
+            "inputs": {
+              "tags": "funk, pop, soul, rock, melodic, guitar, drums, bass, keyboard, percussion, 105 BPM, energetic, upbeat, groovy, vibrant, dynamic",
+              "lyrics": "[verse]\nNeon lights they flicker bright\nCity hums in dead of night\nRhythms pulse through concrete veins\nLost in echoes of refrains\n\n[verse]\nBassline groovin in my chest\nHeartbeats match the citys zest\nElectric whispers fill the air\nSynthesized dreams everywhere\n\n[chorus]\nTurn it up and let it flow\nFeel the fire let it grow\nIn this rhythm we belong\nHear the night sing out our song",
+              "lyrics_strength": 0.99,
+              "clip": ["40", 1]
+            },
+            "class_type": "TextEncodeAceStepAudio",
+            "_meta": {
+              "title": "TextEncodeAceStepAudio"
+            }
+          },
+          "17": {
+            "inputs": {
+              "seconds": 180,
+              "batch_size": 1
+            },
+            "class_type": "EmptyAceStepLatentAudio",
+            "_meta": {
+              "title": "EmptyAceStepLatentAudio"
+            }
+          },
+          "18": {
+            "inputs": {
+              "samples": ["52", 0],
+              "vae": ["40", 2]
+            },
+            "class_type": "VAEDecodeAudio",
+            "_meta": {
+              "title": "VAE Decode Audio"
+            }
+          },
+          "40": {
+            "inputs": {
+              "ckpt_name": "ace_step_v1_3.5b.safetensors"
+            },
+            "class_type": "CheckpointLoaderSimple",
+            "_meta": {
+              "title": "Load Checkpoint"
+            }
+          },
+          "44": {
+            "inputs": {
+              "conditioning": ["14", 0]
+            },
+            "class_type": "ConditioningZeroOut",
+            "_meta": {
+              "title": "ConditioningZeroOut"
+            }
+          },
+          "49": {
+            "inputs": {
+              "model": ["51", 0],
+              "operation": ["50", 0]
+            },
+            "class_type": "LatentApplyOperationCFG",
+            "_meta": {
+              "title": "LatentApplyOperationCFG"
+            }
+          },
+          "50": {
+            "inputs": {
+              "multiplier": 1.15
+            },
+            "class_type": "LatentOperationTonemapReinhard",
+            "_meta": {
+              "title": "LatentOperationTonemapReinhard"
+            }
+          },
+          "51": {
+            "inputs": {
+              "shift": 6,
+              "model": ["40", 0]
+            },
+            "class_type": "ModelSamplingSD3",
+            "_meta": {
+              "title": "ModelSamplingSD3"
+            }
+          },
+          "52": {
+            "inputs": {
+              "seed": "__RANDOM_INT__",
+              "steps": 65,
+              "cfg": 4,
+              "sampler_name": "er_sde",
+              "scheduler": "linear_quadratic",
+              "denoise": 1,
+              "model": ["49", 0],
+              "positive": ["14", 0],
+              "negative": ["44", 0],
+              "latent_image": ["17", 0]
+            },
+            "class_type": "KSampler",
+            "_meta": {
+              "title": "KSampler"
+            }
+          },
+          "59": {
+            "inputs": {
+              "filename_prefix": "audio/ComfyUI",
+              "quality": "V0",
+              "audioUI": "",
+              "audio": ["18", 0]
+            },
+            "class_type": "SaveAudioMP3",
+            "_meta": {
+              "title": "Save Audio (MP3)"
+            }
+          }
+        }
+
+        payload = {
+          "input": {
+            "request_id": "",
+            "workflow_json": workflow,
+            "s3": {
+              "access_key_id": "",
+              "secret_access_key": "",
+              "endpoint_url": "",
+              "bucket_name": "",
+              "region": ""
+            },
+            "webhook": {
+              "url": "",
+              "extra_params": {
+                "user_id": "12345",
+                "project_id": "abc-def"
+              }
+            }
+          }
+        }
+
+        response = await endpoint.request("/generate/sync", payload)
+
+        # Response contains status, output, and any errors
+        print(response["response"])
+
+if __name__ == "__main__":
+    asyncio.run(main())
@@ -2,7 +2,7 @@

 This is the base PyWorker for ComfyUI. It provides a unified interface for running any ComfyUI workflow through a proxy-based architecture.

-The cost for each request has a static value of `1`.  ComfyUI does not handle concurrent workloads and there is no current provision to load multiple instances of ComfyUI per worker node.
+The cost for each request has a static value of `100`.  ComfyUI does not handle concurrent workloads and there is no current provision to load multiple instances of ComfyUI per worker node.

 ## Requirements

@@ -10,55 +10,6 @@ This worker requires both [ComfyUI](https://github.com/comfyanonymous/ComfyUI) a

 A docker image is provided but you may use any if the above requirements are met.

-## Benchmarking
-
-### Custom Benchmark Workflows
-
-You can provide a custom ComfyUI workflow for benchmarking by creating `workers/comfyui-json/misc/benchmark.json`. This allows you to test performance using your preferred models and workflow complexity.
-
-**Ways to provide the benchmark file:**
- Fork this repository and add your `benchmark.json` file
- Write the file during worker provisioning (onstart script or setup phase)
-
-An example file is provided in the repository. To ensure varied generations, use the placeholder `__RANDOM_INT__` in place of static seed values - it will be replaced with a random integer for each benchmark run.
-
-### Default Benchmark (Fallback)
-
-If `benchmark.json` is not available, a simple image generation benchmark runs when each worker initializes. This validates GPU performance and helps identify underperforming machines.
-
-The default benchmark uses Stable Diffusion v1.5 with ComfyUI's standard text-to-image workflow. Configure it using these environment variables:
-
-| Environment Variable | Default Value | Description |
-| -------------------- | ------------- | ----------- |
-| BENCHMARK_TEST_WIDTH | 512 | Image width (pixels) |
-| BENCHMARK_TEST_HEIGHT | 512 | Image height (pixels) |
-| BENCHMARK_TEST_STEPS | 20 | Number of denoising steps |
-
-Each benchmark run uses a random prompt from `misc/test_prompts.txt` and a random seed to ensure consistent GPU load patterns.
-
-#### Calibrating Fallback Benchmark Duration
-
-To screen for underperforming hardware, set `BENCHMARK_TEST_STEPS` to match your expected production workflow duration. This allows you to identify machines that won't meet performance requirements.
-
-**Example:** If your typical workflow should complete in 90 seconds on acceptable hardware:
-
-```bash
-# 1. Measure it/sec on your reference machine
-# RTX 4090 typically achieves ~43 it/sec with SD1.5
-
-# 2. Calculate required steps
-# 90 seconds × 43 it/sec = 3870 steps
-
-# 3. Configure benchmark
-export BENCHMARK_TEST_STEPS=3870
-
-# 4. Machines completing significantly slower than 90s indicate hardware issues
-```
-
-**Performance expectations:**
- Benchmark duration should remain consistent across identical GPU models
- Significant variation (>20%) may indicate thermal, power, or configuration issues
-
 ## Endpoint

 The worker provides a single endpoint:
@@ -215,7 +166,7 @@ WEBHOOK_TIMEOUT=30                   # Webhook timeout in seconds

 ## Client Libraries

-See the test client examples for implementation details on how to integrate with the ComfyUI worker.
+See the client example for implementation details on how to integrate with the ComfyUI worker.

 ---

@@ -1,77 +0,0 @@
-# <INFERENCE_SERVER> + <MODEL_NAME> (serverless)
-
-Run <INFERENCE_SERVER> with our serverless autoscaling infrastructure.
-
-See the [serverless documentation](https://docs.vast.ai/serverless) and the [Getting Started](https://docs.vast.ai/serverless/getting-started) guide for in-depth details about how to use these templates.
-
-## Configuration
-
-Two environment variables are provided to help you configure the <INFERENCE_SERVER> server:
-
-| Variable | Default Value | Used For |
-| --- | --- | --- |
-| `MODEL_NAME` | `<MODEL_NAME>` | The model to load.  Also accepts [hf.co/repo/model](#) links |
-| `<ARGS_VAR>` | `<ARGS_VAL>` | Arguments to pass to the `<ARGS_RECEIVER>` command |
-
-This template has been configured to work with <MIN_VRAM> VRAM. Setting alternative models and server arguments will change the VRAM requirements. Check model cards and <INFERENCE_SERVER_DOCS> for guidance.
-
-## Usage
-
-We have provided a demonstration client to help you implement this template into your own infrastructure
-
-### Client Setup
-
-Clone the PyWorker repository to your local machine and install the necessary requirements for running the test client.
-
-```bash
-git clone https://github.com/vast-ai/pyworker
-cd pyworker
-pip install uv
-uv venv -p 3.12
-source .venv/bin/activate
-uv pip install -r requirements.txt
-```
-
-### Completions
-
-Call to `/v1/completions` with json response
-
-```bash
-python -m workers.openai.client -k <API_KEY> -e <ENDPOINT_NAME> --completion --model <MODEL_NAME>
-```
-
-### Chat Completion (json)
-
-Call to `/v1/chat/completions` with json response
-
-```bash
-python -m workers.openai.client -k <API_KEY> -e <ENDPOINT_NAME> --chat --model <MODEL_NAME>
-```
-
-### Chat Completion (streaming)
-
-Call to `/v1/chat/completions` with streaming response
-
-```bash
-python -m workers.openai.client -k <API_KEY> -e <ENDPOINT_NAME> --chat-stream --model <MODEL_NAME>
-```
-
-### Tool Use (json)
-
-Call to `/v1/chat/completions` with tool and json response.
-
-This test defines a simple tool which will list the contents of the local pyworker directory.  The output is then analysed by the model.
-
-```bash
-python -m workers.openai.client -k <API_KEY> -e <ENDPOINT_NAME> --tools --model <MODEL_NAME>
-```
-
-### Interactive Chat (streaming)
-
-Interactive session with calls to `/v1/chat/completions`.
-
-Type `clear` to clear the chat history or `quit` to exit.
-
-```bash
-python -m workers.openai.client -k <API_KEY> -e <ENDPOINT_NAME> --interactive --model <MODEL_NAME>
-```
@@ -0,0 +1,170 @@
+# ComfyUI Wan 2.2 PyWorker
+
+This is the PyWorker implementation for running **Wan 2.2 T2V A14B** text-to-video workflows in ComfyUI. It provides a unified interface for executing complete ComfyUI video-generation workflows through a proxy-based architecture and returning generated video assets.
+
+Each request has a static cost of `100`. ComfyUI does not support concurrent workloads, and there is no provision to run multiple ComfyUI instances per worker node.
+
+## Requirements
+
+This worker requires the following components:
+
+- ComfyUI (https://github.com/comfyanonymous/ComfyUI)
+- ComfyUI API Wrapper (https://github.com/ai-dock/comfyui-api-wrapper)
+- Wan 2.2 T2V A14B models and required custom nodes
+
+A Docker image is provided with all required Wan 2.2 models pre-installed, but any image may be used if the above requirements are met.
+
+## Endpoint
+
+The worker exposes a single synchronous endpoint:
+
+- `/generate/sync`: Processes a complete ComfyUI workflow JSON and generates video output
+
+## Request Format
+
+The Wan 2.2 worker **only supports custom workflow mode**. Modifier-based workflows are not supported.
+
+```json
+{
+  "input": {
+    "request_id": "uuid-string",
+    "workflow_json": {
+      // Complete ComfyUI Wan 2.2 workflow JSON
+    },
+    "s3": { },
+    "webhook": { }
+  }
+}
+```
+
+## Request Fields
+
+### Required Fields
+
+- `input`: Container for all request parameters
+- `input.workflow_json`: Complete ComfyUI workflow graph for Wan 2.2 video generation
+
+### Optional Fields
+
+- `input.request_id`: Client-defined request identifier
+- `input.s3`: S3-compatible storage configuration
+- `input.webhook`: Webhook configuration for completion notifications
+
+The special string `"__RANDOM_INT__"` may be used in the workflow JSON and will be replaced with a random integer before submission to ComfyUI.
+
+## S3 Configuration
+
+Generated video assets can be automatically uploaded to S3-compatible storage. Configuration can be supplied per request or via environment variables. Request-level values take precedence.
+
+### Via Request JSON
+
+```json
+"s3": {
+  "access_key_id": "your-s3-access-key",
+  "secret_access_key": "your-s3-secret-access-key",
+  "endpoint_url": "https://s3.amazonaws.com",
+  "bucket_name": "your-bucket",
+  "region": "us-east-1"
+}
+```
+
+### Via Environment Variables
+
+```bash
+S3_ACCESS_KEY_ID=your-key
+S3_SECRET_ACCESS_KEY=your-secret
+S3_BUCKET_NAME=your-bucket
+S3_ENDPOINT_URL=https://s3.amazonaws.com
+S3_REGION=us-east-1
+```
+
+## Webhook Configuration
+
+Webhooks are triggered on request completion or failure.
+
+### Via Request JSON
+
+```json
+"webhook": {
+  "url": "https://your-webhook-url",
+  "extra_params": {
+    "custom_field": "value"
+  }
+}
+```
+
+### Via Environment Variables
+
+```bash
+WEBHOOK_URL=https://your-webhook-url
+WEBHOOK_TIMEOUT=30
+```
+
+## Example Request
+
+### Wan 2.2 Text-to-Video Workflow
+
+```json
+{
+  "input": {
+    "workflow_json": {
+      "90": {
+        "inputs": {
+          "clip_name": "umt5_xxl_fp8_e4m3fn_scaled.safetensors",
+          "type": "wan",
+          "device": "default"
+        },
+        "class_type": "CLIPLoader"
+      },
+      "99": {
+        "inputs": {
+          "text": "A cinematic slow-motion portrait of a woman turning her head",
+          "clip": ["90", 0]
+        },
+        "class_type": "CLIPTextEncode"
+      },
+      "104": {
+        "inputs": {
+          "width": 640,
+          "height": 640,
+          "length": 81,
+          "batch_size": 1
+        },
+        "class_type": "EmptyHunyuanLatentVideo"
+      }
+    }
+  }
+}
+```
+
+## Response Format
+
+A successful response includes execution metadata, ComfyUI output details, and generated video assets.
+
+### Response Fields
+
+- `id`: Unique request identifier
+- `status`: `completed`, `failed`, `processing`, `generating`, or `queued`
+- `message`: Human-readable status message
+- `comfyui_response`: Raw response from ComfyUI, including execution status and progress
+- `output`: Array of generated outputs
+- `timings`: Timing information for the request
+
+### Output Object
+
+Each entry in `output` includes:
+
+- `filename`: Generated file name (e.g., `.mp4`)
+- `local_path`: File path on the worker
+- `url`: Pre-signed download URL (if S3 is configured)
+- `type`: Output type (`output`)
+- `subfolder`: Output directory (e.g., `video`)
+- `node_id`: ComfyUI node that produced the output
+- `output_type`: Output category (e.g., `images`)
+
+## Notes and Limitations
+
+- Only full ComfyUI workflow JSONs are supported
+- Concurrent requests are not supported per worker
+- Wan 2.2 models must be installed before processing requests
+- Video generation workflows may take several minutes depending on resolution, length, and GPU performance
@@ -0,0 +1,205 @@
+from vastai import Serverless
+import asyncio
+
+async def main():
+    async with Serverless() as client:
+        endpoint = await client.get_endpoint(name="my-wan-endpoint")
+
+        # ComfyUI API compatible json workflow for Wan 2.2 T2V
+        workflow = {
+          "90": {
+            "inputs": {
+              "clip_name": "umt5_xxl_fp8_e4m3fn_scaled.safetensors",
+              "type": "wan",
+              "device": "default"
+            },
+            "class_type": "CLIPLoader",
+            "_meta": {
+              "title": "Load CLIP"
+            }
+          },
+          "91": {
+            "inputs": {
+              "text": "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走，裸露，NSFW",
+              "clip": ["90", 0]
+            },
+            "class_type": "CLIPTextEncode",
+            "_meta": {
+              "title": "CLIP Text Encode (Negative Prompt)"
+            }
+          },
+          "92": {
+            "inputs": {
+              "vae_name": "wan_2.1_vae.safetensors"
+            },
+            "class_type": "VAELoader",
+            "_meta": {
+              "title": "Load VAE"
+            }
+          },
+          "93": {
+            "inputs": {
+              "shift": 8.000000000000002,
+              "model": ["101", 0]
+            },
+            "class_type": "ModelSamplingSD3",
+            "_meta": {
+              "title": "ModelSamplingSD3"
+            }
+          },
+          "94": {
+            "inputs": {
+              "shift": 8,
+              "model": ["102", 0]
+            },
+            "class_type": "ModelSamplingSD3",
+            "_meta": {
+              "title": "ModelSamplingSD3"
+            }
+          },
+          "95": {
+            "inputs": {
+              "add_noise": "disable",
+              "noise_seed": 0,
+              "steps": 20,
+              "cfg": 3.5,
+              "sampler_name": "euler",
+              "scheduler": "simple",
+              "start_at_step": 10,
+              "end_at_step": 10000,
+              "return_with_leftover_noise": "disable",
+              "model": ["94", 0],
+              "positive": ["99", 0],
+              "negative": ["91", 0],
+              "latent_image": ["96", 0]
+            },
+            "class_type": "KSamplerAdvanced",
+            "_meta": {
+              "title": "KSampler (Advanced)"
+            }
+          },
+          "96": {
+            "inputs": {
+              "add_noise": "enable",
+              "noise_seed": "__RANDOM_INT__",
+              "steps": 20,
+              "cfg": 3.5,
+              "sampler_name": "euler",
+              "scheduler": "simple",
+              "start_at_step": 0,
+              "end_at_step": 10,
+              "return_with_leftover_noise": "enable",
+              "model": ["93", 0],
+              "positive": ["99", 0],
+              "negative": ["91", 0],
+              "latent_image": ["104", 0]
+            },
+            "class_type": "KSamplerAdvanced",
+            "_meta": {
+              "title": "KSampler (Advanced)"
+            }
+          },
+          "97": {
+            "inputs": {
+              "samples": ["95", 0],
+              "vae": ["92", 0]
+            },
+            "class_type": "VAEDecode",
+            "_meta": {
+              "title": "VAE Decode"
+            }
+          },
+          "98": {
+            "inputs": {
+              "filename_prefix": "video/ComfyUI",
+              "format": "auto",
+              "codec": "auto",
+              "video": ["100", 0]
+            },
+            "class_type": "SaveVideo",
+            "_meta": {
+              "title": "Save Video"
+            }
+          },
+          "99": {
+            "inputs": {
+              "text": "Beautiful young European woman with honey blonde hair gracefully turning her head back over shoulder, gentle smile, bright eyes looking at camera. Hair flowing in slow motion as she turns. Soft natural lighting, clean background, cinematic portrait.",
+              "clip": ["90", 0]
+            },
+            "class_type": "CLIPTextEncode",
+            "_meta": {
+              "title": "CLIP Text Encode (Positive Prompt)"
+            }
+          },
+          "100": {
+            "inputs": {
+              "fps": 16,
+              "images": ["97", 0]
+            },
+            "class_type": "CreateVideo",
+            "_meta": {
+              "title": "Create Video"
+            }
+          },
+          "101": {
+            "inputs": {
+              "unet_name": "wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors",
+              "weight_dtype": "default"
+            },
+            "class_type": "UNETLoader",
+            "_meta": {
+              "title": "Load Diffusion Model"
+            }
+          },
+          "102": {
+            "inputs": {
+              "unet_name": "wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors",
+              "weight_dtype": "default"
+            },
+            "class_type": "UNETLoader",
+            "_meta": {
+              "title": "Load Diffusion Model"
+            }
+          },
+          "104": {
+            "inputs": {
+              "width": 640,
+              "height": 640,
+              "length": 81,
+              "batch_size": 1
+            },
+            "class_type": "EmptyHunyuanLatentVideo",
+            "_meta": {
+              "title": "EmptyHunyuanLatentVideo"
+            }
+          }
+        }
+
+        payload = {
+          "input": {
+            "request_id": "",
+            "workflow_json": workflow,
+            "s3": {
+              "access_key_id": "",
+              "secret_access_key": "",
+              "endpoint_url": "",
+              "bucket_name": "",
+              "region": ""
+            },
+            "webhook": {
+              "url": "",
+              "extra_params": {
+                "user_id": "12345",
+                "project_id": "abc-def"
+              }
+            }
+          }
+        }
+
+        response = await endpoint.request("/generate/sync", payload)
+
+        # Response contains status, output, and any errors
+        print(response["response"])
+
+if __name__ == "__main__":
+    asyncio.run(main())