Files
pyworker/workers/openai/README.md
T

89 lines
3.0 KiB
Markdown
Raw Normal View History

2025-07-16 09:46:26 +01:00
# OpenAI Compatible PyWorker
This is the base PyWorker for OpenAI compatible inference servers. See the [Serverless documentation](https://docs.vast.ai/serverless) for guides and how-to's.
## Instance Setup
1. Pick a template
This worker is compatible with any backend API that properly implements the `/v1/completions` and `/v1/chat/completions` endpoints. We currently have three templates you can choose from but you can also create your own without having to modify the PyWorker.
- [vLLM](https://cloud.vast.ai/?ref_id=62897&creator_id=62897&name=vLLM%20%2B%20Qwen%2FQwen3-8B%20(Serverless)) (recommended)
- [Ollama](https://cloud.vast.ai/?ref_id=62897&creator_id=62897&name=Ollama%20%2B%20Qwen3%3A32b%20(Serverless))
- [HuggingFace TGI](https://cloud.vast.ai/?ref_id=62897&creator_id=62897&name=TGI%20%2B%20Qwen3-8B%20(Serverless))
All of these templates can be configured via the template interface. You may want to change the model or startup arguments, depending on the template you selected.
2. Follow the [getting started guide](https://docs.vast.ai/serverless/getting-started) for help with configuring your serverless setup. For testing, we recommend that you use the default options presented by the web interface.
## Client Setup (Demo)
1. Clone the PyWorker repository to your local machine and install the necessary requirements for running the test client.
```bash
git clone https://github.com/vast-ai/pyworker
cd pyworker
pip install uv
uv venv -p 3.12
source .venv/bin/activate
uv pip install -r requirements.txt
```
## Using the Test Client
Several examples have been provided in the client to help you get started with your own implementation.
First, set your API key as an environment variable:
2025-07-16 09:46:26 +01:00
```bash
export VAST_API_KEY=<your_api_key>
```
The `--model` and `--endpoint` flags are optional. If not provided, they default to `Qwen/Qwen3-8B` and `my-vllm-endpoint` respectively.
### Chat Completion (streaming)
Call to `/v1/chat/completions` with streaming response
2025-07-16 09:46:26 +01:00
```bash
python -m workers.openai.client --chat-stream --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>
2025-07-16 09:46:26 +01:00
```
### Interactive Chat (streaming)
2025-07-16 09:46:26 +01:00
Interactive session with calls to `/v1/chat/completions`.
Type `clear` to clear the chat history or `quit` to exit.
2025-07-16 09:46:26 +01:00
```bash
python -m workers.openai.client --interactive --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>
2025-07-16 09:46:26 +01:00
```
### Chat Completion (json)
2025-07-16 09:46:26 +01:00
Call to `/v1/chat/completions` with json response
2025-07-16 09:46:26 +01:00
```bash
python -m workers.openai.client --chat --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>
2025-07-16 09:46:26 +01:00
```
### Tool Use (json)
Call to `/v1/chat/completions` with tool and json response.
This test defines a simple tool which will list the contents of the local pyworker directory. The output is then analysed by the model.
```bash
python -m workers.openai.client --tools --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>
2025-07-16 09:46:26 +01:00
```
### Completions
2025-07-16 09:46:26 +01:00
Call to `/v1/completions` with json response
2025-07-16 09:46:26 +01:00
```bash
python -m workers.openai.client --completion --endpoint <ENDPOINT_NAME> --model <MODEL_NAME>
2025-07-16 09:46:26 +01:00
```