Files

T

Colter Downing 68d8ce4bfd refactor: use endpoint_id instead of endpoint name for routing

- Update route_payload to use endpoint_id instead of endpoint name
- Update AuthData to expect endpoint_id (int) instead of endpoint (str)
- Update ClientState to track endpoint_id
- Update comfyui client functions to use endpoint_id
- Fetch endpoint info (id + api_key) instead of just api_key

This aligns with the autoscaler changes in AUTO-848 that switched
to ID-based endpoint lookups for improved security and consistency.

2025-12-06 14:46:41 -08:00

data_types

Suppress matplot debug logs

2025-10-25 16:18:02 -07:00

__init__.py

OpenAI compatible worker (#19 )

2025-07-16 09:46:26 +01:00

client.py

defaults to ENDPOINT_NAME and DEFAULT_MODEL but uses the flag first if present

2025-12-03 16:57:28 -08:00

README.md

update tgi client

2025-12-03 18:38:42 -08:00

README.templates.md

OpenAI compatible worker (#19 )

2025-07-16 09:46:26 +01:00

server.py

Increase model wait time for vLLM

2025-12-03 12:38:52 -08:00

test_load.py

refactor: use endpoint_id instead of endpoint name for routing

2025-12-06 14:46:41 -08:00

README.md

OpenAI Compatible PyWorker

This is the base PyWorker for OpenAI compatible inference servers. See the Serverless documentation for guides and how-to's.

Instance Setup

Pick a template

This worker is compatible with any backend API that properly implements the /v1/completions and /v1/chat/completions endpoints. We currently have three templates you can choose from but you can also create your own without having to modify the PyWorker.

vLLM (recommended)
Ollama

All of these templates can be configured via the template interface. You may want to change the model or startup arguments, depending on the template you selected.

Follow the getting started guide for help with configuring your serverless setup. For testing, we recommend that you use the default options presented by the web interface.

Client Setup (Demo)

Clone the PyWorker repository to your local machine and install the necessary requirements for running the test client.

git clone https://github.com/vast-ai/pyworker
cd pyworker
pip install uv
uv venv -p 3.12
source .venv/bin/activate
uv pip install -r requirements.txt

Using the Test Client

Several examples have been provided in the client to help you get started with your own implementation.

First, set your API key as an environment variable:

export VAST_API_KEY=<your_api_key>

The --model and --endpoint flags are optional. If not provided, they default to Qwen/Qwen3-8B and my-vllm-endpoint respectively.

Chat Completion (streaming)

Call to /v1/chat/completions with streaming response

python -m workers.openai.client --chat-stream --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>

Interactive Chat (streaming)

Interactive session with calls to /v1/chat/completions.

Type clear to clear the chat history or quit to exit.

python -m workers.openai.client --interactive --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>

Chat Completion (json)

Call to /v1/chat/completions with json response

python -m workers.openai.client --chat --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>

Tool Use (json)

Call to /v1/chat/completions with tool and json response.

This test defines a simple tool which will list the contents of the local pyworker directory. The output is then analysed by the model.

python -m workers.openai.client --tools --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>

Completions

Call to /v1/completions with json response

python -m workers.openai.client --completion --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>