- Update route_payload to use endpoint_id instead of endpoint name - Update AuthData to expect endpoint_id (int) instead of endpoint (str) - Update ClientState to track endpoint_id - Update comfyui client functions to use endpoint_id - Fetch endpoint info (id + api_key) instead of just api_key This aligns with the autoscaler changes in AUTO-848 that switched to ID-based endpoint lookups for improved security and consistency.
OpenAI Compatible PyWorker
This is the base PyWorker for OpenAI compatible inference servers. See the Serverless documentation for guides and how-to's.
Instance Setup
- Pick a template
This worker is compatible with any backend API that properly implements the /v1/completions and /v1/chat/completions endpoints. We currently have three templates you can choose from but you can also create your own without having to modify the PyWorker.
All of these templates can be configured via the template interface. You may want to change the model or startup arguments, depending on the template you selected.
- Follow the getting started guide for help with configuring your serverless setup. For testing, we recommend that you use the default options presented by the web interface.
Client Setup (Demo)
- Clone the PyWorker repository to your local machine and install the necessary requirements for running the test client.
git clone https://github.com/vast-ai/pyworker
cd pyworker
pip install uv
uv venv -p 3.12
source .venv/bin/activate
uv pip install -r requirements.txt
Using the Test Client
Several examples have been provided in the client to help you get started with your own implementation.
First, set your API key as an environment variable:
export VAST_API_KEY=<your_api_key>
The --model and --endpoint flags are optional. If not provided, they default to Qwen/Qwen3-8B and my-vllm-endpoint respectively.
Chat Completion (streaming)
Call to /v1/chat/completions with streaming response
python -m workers.openai.client --chat-stream --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>
Interactive Chat (streaming)
Interactive session with calls to /v1/chat/completions.
Type clear to clear the chat history or quit to exit.
python -m workers.openai.client --interactive --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>
Chat Completion (json)
Call to /v1/chat/completions with json response
python -m workers.openai.client --chat --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>
Tool Use (json)
Call to /v1/chat/completions with tool and json response.
This test defines a simple tool which will list the contents of the local pyworker directory. The output is then analysed by the model.
python -m workers.openai.client --tools --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>
Completions
Call to /v1/completions with json response
python -m workers.openai.client --completion --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>