Merge branch 'main' of github.com:kristopolous/pyworker

2025-03-27 11:04:51 -07:00
parent f21607d2d4 8d58b16e3a
commit 334baf60f7
3 changed files with 111 additions and 14 deletions
@@ -7,8 +7,83 @@ same instance. Additionally, it monitors performance metrics and estimates curre
 such as the number of tokens processed for LLMs or image resolution and steps for image generation models,
 reporting these metrics to the autoscaler.
 ## Project Structure
 *   `lib/`: Contains the core PyWorker framework code (server logic, data types, metrics).
 *   `workers/`: Contains specific implementations (PyWorkers) for different model servers. Each subdirectory represents a worker for a particular model type.
 ## Getting Started
 1.  **Install Dependencies:**
    ```bash
    pip install -r requirements.txt
    ```
    You may also need `pyright` for type checking:
    ```bash
    sudo npm install -g pyright
    # or use your preferred method to install pyright
    ```
 2.  **Configure Environment:** Set any necessary environment variables (e.g., `MODEL_LOG` path, API keys if needed by your worker).
 3.  **Run the Server:** Use the provided script. You'll need to specify which worker to run.
    ```bash
    # Example for hello_world worker (assuming MODEL_LOG is set)
    ./start_server.sh workers.hello_world.server
    ```
    Replace `workers.hello_world.server` with the path to the `server.py` module of the worker you want to run.
 ## How to Use
-If you want to use autoscaler, you just need to use one of Vast's autoscaler templates. If you'd like to
+### Using Existing Workers
-implement PyWorker for a template that is not marked as autoscaler compatible on Vast, refer to
+
-`workers/hello_world/README.md`
+If you are using a Vast.ai template that includes PyWorker integration (marked as autoscaler compatible), it should work out of the box. The template will typically start the appropriate PyWorker server automatically. Here's a few:
 *   **TGI (Text Generation Inference):** [Vast.ai Template](https://cloud.vast.ai?ref_id=140778&template_id=72d8dcb41ea3a58e06c741e2c725bc00)
 *   **ComfyUI:** [Vast.ai Template](https://cloud.vast.ai?ref_id=140778&template_id=ad72c8bf7cf695c3c9ddf0eaf6da0447)
 Currently available workers:
 *   `hello_world`: A simple example worker for a basic LLM server.
 *   `comfyui`: A worker for the ComfyUI image generation backend.
 *   `tgi`: A worker for the Text Generation Inference backend.
 ### Implementing a New Worker
 To integrate PyWorker with a model server not already supported, you need to create a new worker implementation under the `workers/` directory. Follow these general steps:
 1.  **Create Worker Directory:** Add a new directory under `workers/` (e.g., `workers/my_model/`).
 2.  **Define Data Types (`data_types.py`):**
    *   Create a class inheriting from `lib.data_types.ApiPayload`.
    *   Implement methods like `for_test`, `generate_payload_json`, `count_workload`, and `from_json_msg` to handle request data, testing, and workload calculation specific to your model's API.
 3.  **Implement Endpoint Handlers (`server.py`):**
    *   For each model API endpoint you want PyWorker to proxy, create a class inheriting from `lib.data_types.EndpointHandler`.
    *   Implement methods like `endpoint`, `payload_cls`, `generate_payload_json`, `make_benchmark_payload` (for one handler), and `generate_client_response`.
    *   Instantiate `lib.backend.Backend` with your model server details, log file path, benchmark handler, and log actions.
    *   Define `aiohttp` routes, mapping paths to your handlers using `backend.create_handler()`.
    *   Use `lib.server.start_server` to run the application.
 4.  **Add `__init__.py`:** Create an empty `__init__.py` file in your worker directory.
 5.  **(Optional) Add Load Testing (`test_load.py`):** Create a script using `lib.test_harness.run` to test your worker against a Vast.ai endpoint group.
 6.  **(Optional) Add Client Example (`client.py`):** Provide a script demonstrating how to call your worker's endpoints.
 **For a detailed walkthrough, refer to the `hello_world` example:** [workers/hello_world/README.md](workers/hello_world/README.md)
 **Type Hinting:** It is strongly recommended to use strict type hinting throughout your implementation. Use `pyright` to check for type errors.
 ## Testing Your Worker
 If you implement a `test_load.py` script for your worker, you can use it to load test a Vast.ai endpoint group running your instance image.
 ```bash
 # Example for hello_world worker
 python3 -m workers.hello_world.test_load -n 1000 -rps 0.5 -k "$API_KEY" -e "$ENDPOINT_GROUP_NAME"
 ```
 Replace `workers.hello_world.test_load` with the path to your worker's test script and provide your Vast.ai API Key (`-k`) and the target Endpoint Group Name (`-e`). Adjust the number of requests (`-n`) and requests per second (`-rps`) as needed.
 ## Community & Support
 Join the conversation and get help:
 *   **Vast.ai Discord:** [https://discord.gg/Pa9M29FFye](https://discord.gg/Pa9M29FFye)
 *   **Vast.ai Subreddit:** [https://reddit.com/r/vastai/](https://reddit.com/r/vastai/)
@@ -1,6 +1,6 @@
 aiofiles==24.1.0
 aiohappyeyeballs==2.3.4
-aiohttp==3.10.0
+aiohttp==3.11.0b0
 aiojobs==1.2.1
 aiosignal==1.3.1
 anyio==4.4.0
@@ -40,7 +40,7 @@ tiktoken==0.7.0
 token-count==0.2.1
 tokenizers==0.19.1
 tqdm==4.66.4
-transformers==4.43.2
+transformers==4.48.0
 typing_extensions==4.12.2
 urllib3==2.2.2
 wheel==0.43.0
@@ -2,7 +2,7 @@
 ## Hello_world example
-There is a hello_world PyWorker implantation under `workers/hello_world`. This PyWorker is
+There is a hello_world PyWorker implementation under `workers/hello_world`. This PyWorker is
 created for an LLM model server that runs on port 5001 has two API endpoints:
 1. `/generate`: generates an full response to the prompt and sends a JSON response
@@ -40,10 +40,17 @@ This will allow your IDE or VSCode with `pyright` plugin to find any type errors
 You can also install `pyright` with `sudo npm install -g pyright` and run `pyright` in the root of the project to find
 any type errors.
-#### data_Types.py
+#### data_types.py: Contains data types representing model API endpoints
 This file defines the structure of the data your model server expects (its API contract) and, critically, how PyWorker *interprets* that data for autoscaling purposes. You define Python data classes that mirror the JSON payloads your model's API uses.
 These classes **must** inherit from `lib.data_types.ApiPayload`. This inheritance is not just for structure; it's how PyWorker knows how to:
 *   **Parse Incoming Requests:** Convert JSON from clients into usable Python objects.
 *   **Calculate Workload:** Determine the computational cost of a request.
 *   **Generate Test Data:** Create realistic inputs for benchmarking.
 *   **Format Requests for the Model Server:** Prepare data for the underlying model.
 data classes representing the model API are defined here. They must inherit from
 `lib.data_types.ApiPayload`. `ApiPayload` is an abstract class and you need to define several functions for it:
 ```python
 import dataclasses
@@ -105,12 +112,27 @@ class InputData(ApiPayload):
 ```
-#### server.py
+#### server.py: Creating Your Model's API Endpoints
-For every model API endpoint you want to use, you must implement an `EndpointHandler`. This class handles incoming
+This section guides you through creating the core of your custom model API: the `EndpointHandler`.  Think of `EndpointHandler` as the bridge between incoming requests from users and your underlying model.  It's the key to making your model accessible and scalable.
-requests, processes them, sends them to the model API server, and finally returns an HTTP response.
+
-`EndpointHandler` has several abstract functions that must be implemented. Here, we implement two, one
+**Why use an `EndpointHandler`?**
-for `/generate`, and one for `/generate_stream`:
+
 *   **Organized Request Handling:** It provides a structured way to handle different types of requests (like generating text, generating images, or performing other model-specific tasks).
 *   **Scalability:** By separating request handling from the model itself, you can easily scale your API to handle many concurrent users.
 *   **Flexibility:** You can customize how requests are processed, validated, and transformed before being sent to your model.
 *   **Standard Interface:** It provides a consistent interface for interacting with your model, regardless of the underlying implementation.
 For every model API endpoint you want to expose (e.g., `/generate`, `/generate_stream`), you'll implement an `EndpointHandler`. This class is responsible for:
 The `EndpointHandler` achieves this through several key methods:
 *   **Receiving and validating incoming requests (`get_data_from_request`):** This method ensures the request contains the necessary data (authentication and payload) and is in the correct format. It's the entry point for all requests.
 *   **Defining the endpoint (`endpoint`):**  This method specifies the URL endpoint on the model API server where requests will be sent (e.g., `/generate`).
 *   **Specifying the payload type (`payload_cls`):** This method indicates the specific `ApiPayload` class used for this endpoint, defining the structure of the request data.
 *   **Creating benchmark payloads (`make_benchmark_payload`):**  This method creates payloads specifically for benchmarking the model's performance.
 *   **Handling the model's response (`generate_client_response`):** This method takes the response from the model API server and transforms it into the format expected by the client making the request to your PyWorker.  This allows you to customize the output as needed.
 The `EndpointHandler` class has several abstract functions that you *must* implement to define the behavior of your specific endpoints.  Here, we'll implement two common endpoints: `/generate` (for synchronous requests) and `/generate_stream` (for streaming responses):
 ```python