Use PyWorker SDK (#67)

* Change PyWorker to Worker SDK * Moved /lib to vast-sdk (https://github.com/vast-ai/vast-sdk)
2025-12-15 22:33:03 -05:00
parent 2ce741a8b7
commit 4380d98c01
54 changed files with 1622 additions and 4626 deletions
@@ -0,0 +1,170 @@
+# ComfyUI Wan 2.2 PyWorker
+
+This is the PyWorker implementation for running **Wan 2.2 T2V A14B** text-to-video workflows in ComfyUI. It provides a unified interface for executing complete ComfyUI video-generation workflows through a proxy-based architecture and returning generated video assets.
+
+Each request has a static cost of `10000`. ComfyUI does not support concurrent workloads, and there is no provision to run multiple ComfyUI instances per worker node.
+
+## Requirements
+
+This worker requires the following components:
+
+- ComfyUI (https://github.com/comfyanonymous/ComfyUI)
+- ComfyUI API Wrapper (https://github.com/ai-dock/comfyui-api-wrapper)
+- Wan 2.2 T2V A14B models and required custom nodes
+
+A Docker image is provided with all required Wan 2.2 models pre-installed, but any image may be used if the above requirements are met.
+
+## Endpoint
+
+The worker exposes a single synchronous endpoint:
+
+- `/generate/sync`: Processes a complete ComfyUI workflow JSON and generates video output
+
+## Request Format
+
+The Wan 2.2 worker **only supports custom workflow mode**. Modifier-based workflows are not supported.
+
+```json
+{
+  "input": {
+    "request_id": "uuid-string",
+    "workflow_json": {
+      // Complete ComfyUI Wan 2.2 workflow JSON
+    },
+    "s3": { },
+    "webhook": { }
+  }
+}
+```
+
+## Request Fields
+
+### Required Fields
+
+- `input`: Container for all request parameters
+- `input.workflow_json`: Complete ComfyUI workflow graph for Wan 2.2 video generation
+
+### Optional Fields
+
+- `input.request_id`: Client-defined request identifier
+- `input.s3`: S3-compatible storage configuration
+- `input.webhook`: Webhook configuration for completion notifications
+
+The special string `"__RANDOM_INT__"` may be used in the workflow JSON and will be replaced with a random integer before submission to ComfyUI.
+
+## S3 Configuration
+
+Generated video assets can be automatically uploaded to S3-compatible storage. Configuration can be supplied per request or via environment variables. Request-level values take precedence.
+
+### Via Request JSON
+
+```json
+"s3": {
+  "access_key_id": "your-s3-access-key",
+  "secret_access_key": "your-s3-secret-access-key",
+  "endpoint_url": "https://s3.amazonaws.com",
+  "bucket_name": "your-bucket",
+  "region": "us-east-1"
+}
+```
+
+### Via Environment Variables
+
+```bash
+S3_ACCESS_KEY_ID=your-key
+S3_SECRET_ACCESS_KEY=your-secret
+S3_BUCKET_NAME=your-bucket
+S3_ENDPOINT_URL=https://s3.amazonaws.com
+S3_REGION=us-east-1
+```
+
+## Webhook Configuration
+
+Webhooks are triggered on request completion or failure.
+
+### Via Request JSON
+
+```json
+"webhook": {
+  "url": "https://your-webhook-url",
+  "extra_params": {
+    "custom_field": "value"
+  }
+}
+```
+
+### Via Environment Variables
+
+```bash
+WEBHOOK_URL=https://your-webhook-url
+WEBHOOK_TIMEOUT=30
+```
+
+## Example Request
+
+### Wan 2.2 Text-to-Video Workflow
+
+```json
+{
+  "input": {
+    "workflow_json": {
+      "90": {
+        "inputs": {
+          "clip_name": "umt5_xxl_fp8_e4m3fn_scaled.safetensors",
+          "type": "wan",
+          "device": "default"
+        },
+        "class_type": "CLIPLoader"
+      },
+      "99": {
+        "inputs": {
+          "text": "A cinematic slow-motion portrait of a woman turning her head",
+          "clip": ["90", 0]
+        },
+        "class_type": "CLIPTextEncode"
+      },
+      "104": {
+        "inputs": {
+          "width": 640,
+          "height": 640,
+          "length": 81,
+          "batch_size": 1
+        },
+        "class_type": "EmptyHunyuanLatentVideo"
+      }
+    }
+  }
+}
+```
+
+## Response Format
+
+A successful response includes execution metadata, ComfyUI output details, and generated video assets.
+
+### Response Fields
+
+- `id`: Unique request identifier
+- `status`: `completed`, `failed`, `processing`, `generating`, or `queued`
+- `message`: Human-readable status message
+- `comfyui_response`: Raw response from ComfyUI, including execution status and progress
+- `output`: Array of generated outputs
+- `timings`: Timing information for the request
+
+### Output Object
+
+Each entry in `output` includes:
+
+- `filename`: Generated file name (e.g., `.mp4`)
+- `local_path`: File path on the worker
+- `url`: Pre-signed download URL (if S3 is configured)
+- `type`: Output type (`output`)
+- `subfolder`: Output directory (e.g., `video`)
+- `node_id`: ComfyUI node that produced the output
+- `output_type`: Output category (e.g., `images`)
+
+## Notes and Limitations
+
+- Only full ComfyUI workflow JSONs are supported
+- Concurrent requests are not supported per worker
+- Wan 2.2 models must be installed before processing requests
+- Video generation workflows may take several minutes depending on resolution, length, and GPU performance
@@ -0,0 +1,205 @@
+from vastai import Serverless
+import asyncio
+
+async def main():
+    async with Serverless() as client:
+        endpoint = await client.get_endpoint(name="my-wan-endpoint")
+
+        # ComfyUI API compatible json workflow for Wan 2.2 T2V
+        workflow = {
+          "90": {
+            "inputs": {
+              "clip_name": "umt5_xxl_fp8_e4m3fn_scaled.safetensors",
+              "type": "wan",
+              "device": "default"
+            },
+            "class_type": "CLIPLoader",
+            "_meta": {
+              "title": "Load CLIP"
+            }
+          },
+          "91": {
+            "inputs": {
+              "text": "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走，裸露，NSFW",
+              "clip": ["90", 0]
+            },
+            "class_type": "CLIPTextEncode",
+            "_meta": {
+              "title": "CLIP Text Encode (Negative Prompt)"
+            }
+          },
+          "92": {
+            "inputs": {
+              "vae_name": "wan_2.1_vae.safetensors"
+            },
+            "class_type": "VAELoader",
+            "_meta": {
+              "title": "Load VAE"
+            }
+          },
+          "93": {
+            "inputs": {
+              "shift": 8.000000000000002,
+              "model": ["101", 0]
+            },
+            "class_type": "ModelSamplingSD3",
+            "_meta": {
+              "title": "ModelSamplingSD3"
+            }
+          },
+          "94": {
+            "inputs": {
+              "shift": 8,
+              "model": ["102", 0]
+            },
+            "class_type": "ModelSamplingSD3",
+            "_meta": {
+              "title": "ModelSamplingSD3"
+            }
+          },
+          "95": {
+            "inputs": {
+              "add_noise": "disable",
+              "noise_seed": 0,
+              "steps": 20,
+              "cfg": 3.5,
+              "sampler_name": "euler",
+              "scheduler": "simple",
+              "start_at_step": 10,
+              "end_at_step": 10000,
+              "return_with_leftover_noise": "disable",
+              "model": ["94", 0],
+              "positive": ["99", 0],
+              "negative": ["91", 0],
+              "latent_image": ["96", 0]
+            },
+            "class_type": "KSamplerAdvanced",
+            "_meta": {
+              "title": "KSampler (Advanced)"
+            }
+          },
+          "96": {
+            "inputs": {
+              "add_noise": "enable",
+              "noise_seed": "__RANDOM_INT__",
+              "steps": 20,
+              "cfg": 3.5,
+              "sampler_name": "euler",
+              "scheduler": "simple",
+              "start_at_step": 0,
+              "end_at_step": 10,
+              "return_with_leftover_noise": "enable",
+              "model": ["93", 0],
+              "positive": ["99", 0],
+              "negative": ["91", 0],
+              "latent_image": ["104", 0]
+            },
+            "class_type": "KSamplerAdvanced",
+            "_meta": {
+              "title": "KSampler (Advanced)"
+            }
+          },
+          "97": {
+            "inputs": {
+              "samples": ["95", 0],
+              "vae": ["92", 0]
+            },
+            "class_type": "VAEDecode",
+            "_meta": {
+              "title": "VAE Decode"
+            }
+          },
+          "98": {
+            "inputs": {
+              "filename_prefix": "video/ComfyUI",
+              "format": "auto",
+              "codec": "auto",
+              "video": ["100", 0]
+            },
+            "class_type": "SaveVideo",
+            "_meta": {
+              "title": "Save Video"
+            }
+          },
+          "99": {
+            "inputs": {
+              "text": "Beautiful young European woman with honey blonde hair gracefully turning her head back over shoulder, gentle smile, bright eyes looking at camera. Hair flowing in slow motion as she turns. Soft natural lighting, clean background, cinematic portrait.",
+              "clip": ["90", 0]
+            },
+            "class_type": "CLIPTextEncode",
+            "_meta": {
+              "title": "CLIP Text Encode (Positive Prompt)"
+            }
+          },
+          "100": {
+            "inputs": {
+              "fps": 16,
+              "images": ["97", 0]
+            },
+            "class_type": "CreateVideo",
+            "_meta": {
+              "title": "Create Video"
+            }
+          },
+          "101": {
+            "inputs": {
+              "unet_name": "wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors",
+              "weight_dtype": "default"
+            },
+            "class_type": "UNETLoader",
+            "_meta": {
+              "title": "Load Diffusion Model"
+            }
+          },
+          "102": {
+            "inputs": {
+              "unet_name": "wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors",
+              "weight_dtype": "default"
+            },
+            "class_type": "UNETLoader",
+            "_meta": {
+              "title": "Load Diffusion Model"
+            }
+          },
+          "104": {
+            "inputs": {
+              "width": 640,
+              "height": 640,
+              "length": 81,
+              "batch_size": 1
+            },
+            "class_type": "EmptyHunyuanLatentVideo",
+            "_meta": {
+              "title": "EmptyHunyuanLatentVideo"
+            }
+          }
+        }
+
+        payload = {
+          "input": {
+            "request_id": "",
+            "workflow_json": workflow,
+            "s3": {
+              "access_key_id": "",
+              "secret_access_key": "",
+              "endpoint_url": "",
+              "bucket_name": "",
+              "region": ""
+            },
+            "webhook": {
+              "url": "",
+              "extra_params": {
+                "user_id": "12345",
+                "project_id": "abc-def"
+              }
+            }
+          }
+        }
+
+        response = await endpoint.request("/generate/sync", payload)
+
+        # Response contains status, output, and any errors
+        print(response["response"])
+
+if __name__ == "__main__":
+    asyncio.run(main())
@@ -0,0 +1,288 @@
+import random
+import sys
+
+from vastai import Worker, WorkerConfig, HandlerConfig, LogActionConfig, BenchmarkConfig
+
+# ComyUI model configuration
+MODEL_SERVER_URL           = 'http://127.0.0.1'
+MODEL_SERVER_PORT          = 18288
+MODEL_LOG_FILE             = '/var/log/portal/comfyui.log'
+MODEL_HEALTHCHECK_ENDPOINT = "/health"
+
+# ComyUI-specific log messages
+MODEL_LOAD_LOG_MSG = [
+    "To see the GUI go to: "
+]
+
+MODEL_ERROR_LOG_MSGS = [
+    "MetadataIncompleteBuffer",
+    "Value not in list: ",
+    "[ERROR] Provisioning Script failed"
+]
+
+MODEL_INFO_LOG_MSGS = [
+    '"message":"Downloading'
+]
+
+benchmark_prompts = [
+    "Cartoon hoodie hero; orc, anime cat, bunny; black goo; buff; vector on white.",
+    "Cozy farming-game scene with fine details.",
+    "2D vector child with soccer ball; airbrush chrome; swagger; antique copper.",
+    "Realistic futuristic downtown of low buildings at sunset.",
+    "Perfect wave front view; sunny seascape; ultra-detailed water; artful feel.",
+    "Clear cup with ice, fruit, mint; creamy swirls; fluid-sim CGI; warm glow.",
+    "Male biker with backpack on motorcycle; oilpunk; award-worthy magazine cover.",
+    "Collage for textile; surreal cartoon cat in cap/jeans before poster; crisp.",
+    "Medieval village inside glass sphere; volumetric light; macro focus.",
+    "Iron Man with glowing axe; mecha sci-fi; jungle scene; dynamic light.",
+    "Pope Francis DJ in leather jacket, mixing on giant console; dramatic.",
+]
+
+benchmark_dataset = [
+    {
+        "input": {
+            "workflow_json": {
+                "90": {
+                    "inputs": {
+                    "clip_name": "umt5_xxl_fp8_e4m3fn_scaled.safetensors",
+                    "type": "wan",
+                    "device": "default"
+                    },
+                    "class_type": "CLIPLoader",
+                    "_meta": {
+                    "title": "Load CLIP"
+                    }
+                },
+                "91": {
+                    "inputs": {
+                    "text": "色调艳丽，过曝，静态，细节模糊不清，字幕，风格，作品，画作，画面，静止，整体发灰，最差质量，低质量，JPEG压缩残留，丑陋的，残缺的，多余的手指，画得不好的手部，画得不好的脸部，畸形的，毁容的，形态畸形的肢体，手指融合，静止不动的画面，杂乱的背景，三条腿，背景人很多，倒着走，裸露，NSFW",
+                    "clip": [
+                        "90",
+                        0
+                    ]
+                    },
+                    "class_type": "CLIPTextEncode",
+                    "_meta": {
+                    "title": "CLIP Text Encode (Negative Prompt)"
+                    }
+                },
+                "92": {
+                    "inputs": {
+                    "vae_name": "wan_2.1_vae.safetensors"
+                    },
+                    "class_type": "VAELoader",
+                    "_meta": {
+                    "title": "Load VAE"
+                    }
+                },
+                "93": {
+                    "inputs": {
+                    "shift": 8.000000000000002,
+                    "model": [
+                        "101",
+                        0
+                    ]
+                    },
+                    "class_type": "ModelSamplingSD3",
+                    "_meta": {
+                    "title": "ModelSamplingSD3"
+                    }
+                },
+                "94": {
+                    "inputs": {
+                    "shift": 8,
+                    "model": [
+                        "102",
+                        0
+                    ]
+                    },
+                    "class_type": "ModelSamplingSD3",
+                    "_meta": {
+                    "title": "ModelSamplingSD3"
+                    }
+                },
+                "95": {
+                    "inputs": {
+                    "add_noise": "disable",
+                    "noise_seed": 0,
+                    "steps": 20,
+                    "cfg": 3.5,
+                    "sampler_name": "euler",
+                    "scheduler": "simple",
+                    "start_at_step": 10,
+                    "end_at_step": 10000,
+                    "return_with_leftover_noise": "disable",
+                    "model": [
+                        "94",
+                        0
+                    ],
+                    "positive": [
+                        "99",
+                        0
+                    ],
+                    "negative": [
+                        "91",
+                        0
+                    ],
+                    "latent_image": [
+                        "96",
+                        0
+                    ]
+                    },
+                    "class_type": "KSamplerAdvanced",
+                    "_meta": {
+                    "title": "KSampler (Advanced)"
+                    }
+                },
+                "96": {
+                    "inputs": {
+                    "add_noise": "enable",
+                    "noise_seed": "__RANDOM_INT__",
+                    "steps": 20,
+                    "cfg": 3.5,
+                    "sampler_name": "euler",
+                    "scheduler": "simple",
+                    "start_at_step": 0,
+                    "end_at_step": 10,
+                    "return_with_leftover_noise": "enable",
+                    "model": [
+                        "93",
+                        0
+                    ],
+                    "positive": [
+                        "99",
+                        0
+                    ],
+                    "negative": [
+                        "91",
+                        0
+                    ],
+                    "latent_image": [
+                        "104",
+                        0
+                    ]
+                    },
+                    "class_type": "KSamplerAdvanced",
+                    "_meta": {
+                    "title": "KSampler (Advanced)"
+                    }
+                },
+                "97": {
+                    "inputs": {
+                    "samples": [
+                        "95",
+                        0
+                    ],
+                    "vae": [
+                        "92",
+                        0
+                    ]
+                    },
+                    "class_type": "VAEDecode",
+                    "_meta": {
+                    "title": "VAE Decode"
+                    }
+                },
+                "98": {
+                    "inputs": {
+                    "filename_prefix": "video/ComfyUI",
+                    "format": "auto",
+                    "codec": "auto",
+                    "video": [
+                        "100",
+                        0
+                    ]
+                    },
+                    "class_type": "SaveVideo",
+                    "_meta": {
+                    "title": "Save Video"
+                    }
+                },
+                "99": {
+                    "inputs": {
+                    "text":prompt,
+                    "clip": [
+                        "90",
+                        0
+                    ]
+                    },
+                    "class_type": "CLIPTextEncode",
+                    "_meta": {
+                    "title": "CLIP Text Encode (Positive Prompt)"
+                    }
+                },
+                "100": {
+                    "inputs": {
+                    "fps": 16,
+                    "images": [
+                        "97",
+                        0
+                    ]
+                    },
+                    "class_type": "CreateVideo",
+                    "_meta": {
+                    "title": "Create Video"
+                    }
+                },
+                "101": {
+                    "inputs": {
+                    "unet_name": "wan2.2_t2v_high_noise_14B_fp8_scaled.safetensors",
+                    "weight_dtype": "default"
+                    },
+                    "class_type": "UNETLoader",
+                    "_meta": {
+                    "title": "Load Diffusion Model"
+                    }
+                },
+                "102": {
+                    "inputs": {
+                    "unet_name": "wan2.2_t2v_low_noise_14B_fp8_scaled.safetensors",
+                    "weight_dtype": "default"
+                    },
+                    "class_type": "UNETLoader",
+                    "_meta": {
+                    "title": "Load Diffusion Model"
+                    }
+                },
+                "104": {
+                    "inputs": {
+                    "width": 640,
+                    "height": 640,
+                    "length": 81,
+                    "batch_size": 1
+                    },
+                    "class_type": "EmptyHunyuanLatentVideo",
+                    "_meta": {
+                    "title": "EmptyHunyuanLatentVideo"
+                    }
+                }
+            }
+        }
+    } for prompt in benchmark_prompts
+]
+
+worker_config = WorkerConfig(
+    model_server_url=MODEL_SERVER_URL,
+    model_server_port=MODEL_SERVER_PORT,
+    model_log_file=MODEL_LOG_FILE,
+    model_healthcheck_url=MODEL_HEALTHCHECK_ENDPOINT,
+    handlers=[
+        HandlerConfig(
+            route="/generate/sync",
+            allow_parallel_requests=False,
+            max_queue_time=10.0,
+            benchmark_config=BenchmarkConfig(
+                dataset=benchmark_dataset,
+                runs=1
+            ),
+            workload_calculator= lambda _ : 10000.0
+        )
+    ],
+    log_action_config=LogActionConfig(
+        on_load=MODEL_LOAD_LOG_MSG,
+        on_error=MODEL_ERROR_LOG_MSGS,
+        on_info=MODEL_INFO_LOG_MSGS
+    )
+)
+
+Worker(worker_config).run()