Files

T

Rob Ballantyne 69d9b7455f OpenAI compatible worker (#19 )

Adds initial support for OpenAI compatible inference servers

Available endpoints:

- `/v1/completions`
- `/v1/chat/completions`

2025-07-16 09:46:26 +01:00

2.8 KiB

Raw Blame History

OpenAI Compatible PyWorker

This is the base PyWorker for OpenAI compatible inference servers. See the Serverless documentation for guides and how-to's.

Instance Setup

Pick a template

This worker is compatible with any backend API that properly implements the /v1/completions and /v1/chat/completions endpoints. We currently have three templates you can choose from but you can also create your own without having to modify the PyWorker.

vLLM (recommended)
Ollama
HuggingFace TGI

All of these templates can be configured via the template interface. You may want to change the model or startup arguments, depending on the template you selected.

Follow the getting started guide for help with configuring your serverless setup. For testing, we recommend that you use the default options presented by the web interface.

Client Setup (Demo)

Clone the PyWorker repository to your local machine and install the necessary requirements for running the test client.

git clone https://github.com/vast-ai/pyworker
cd pyworker
pip install uv
uv venv -p 3.12
source .venv/bin/activate
uv pip install -r requirements.txt

Using the Test Client

Several examples have been provided in the client to help you get started with your own implementation.

Completions

Call to /v1/completions with json response

python -m workers.openai.client -k <API_KEY> -e <ENDPOINT_NAME> --completion --model <MODEL_NAME>

Chat Completion (json)

Call to /v1/chat/completions with json response

python -m workers.openai.client -k <API_KEY> -e <ENDPOINT_NAME> --chat --model <MODEL_NAME>

Chat Completion (streaming)

Call to /v1/chat/completions with streaming response

python -m workers.openai.client -k <API_KEY> -e <ENDPOINT_NAME> --chat-stream --model <MODEL_NAME>

Tool Use (json)

Call to /v1/chat/completions with tool and json response.

This test defines a simple tool which will list the contents of the local pyworker directory. The output is then analysed by the model.

python -m workers.openai.client -k <API_KEY> -e <ENDPOINT_NAME> --tools --model <MODEL_NAME>

Interactive Chat (streaming)

Interactive session with calls to /v1/chat/completions.

Type clear to clear the chat history or quit to exit.

python -m workers.openai.client -k <API_KEY> -e <ENDPOINT_NAME> --interactive --model <MODEL_NAME>

2.8 KiB Raw Blame History