Files
pyworker/workers/openai
2025-12-17 11:55:33 -08:00
..
2025-07-16 09:46:26 +01:00
2025-12-15 19:33:03 -08:00
2025-12-03 18:38:42 -08:00

OpenAI Compatible PyWorker

This is the base PyWorker for OpenAI compatible inference servers. See the Serverless documentation for guides and how-to's.

Instance Setup

  1. Pick a template

This worker is compatible with any backend API that properly implements the /v1/completions and /v1/chat/completions endpoints. We currently have three templates you can choose from but you can also create your own without having to modify the PyWorker.

All of these templates can be configured via the template interface. You may want to change the model or startup arguments, depending on the template you selected.

  1. Follow the getting started guide for help with configuring your serverless setup. For testing, we recommend that you use the default options presented by the web interface.

Client Setup (Demo)

  1. Clone the PyWorker repository to your local machine and install the necessary requirements for running the test client.
git clone https://github.com/vast-ai/pyworker
cd pyworker
pip install uv
uv venv -p 3.12
source .venv/bin/activate
uv pip install -r requirements.txt

Using the Test Client

Several examples have been provided in the client to help you get started with your own implementation.

First, set your API key as an environment variable:

export VAST_API_KEY=<your_api_key>

The --model and --endpoint flags are optional. If not provided, they default to Qwen/Qwen3-8B and my-vllm-endpoint respectively.

Chat Completion (streaming)

Call to /v1/chat/completions with streaming response

python -m workers.openai.client --chat-stream --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>

Interactive Chat (streaming)

Interactive session with calls to /v1/chat/completions.

Type clear to clear the chat history or quit to exit.

python -m workers.openai.client --interactive --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>

Chat Completion (json)

Call to /v1/chat/completions with json response

python -m workers.openai.client --chat --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>

Tool Use (json)

Call to /v1/chat/completions with tool and json response.

This test defines a simple tool which will list the contents of the local pyworker directory. The output is then analysed by the model.

python -m workers.openai.client --tools --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>

Completions

Call to /v1/completions with json response

python -m workers.openai.client --completion --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>