Rob Ballantyne
3668d948be
Simplify null pyworker README intro to serverless terminology
...
Drop the "autoscaler provisions a worker if none is free" phrasing in
favor of the simpler "request comes in and you get a worker; release and
it scales back down."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-11 17:02:41 +01:00
Rob Ballantyne
254ccdf181
Add /release control endpoint to null pyworker
...
The held /reserve now waits on an asyncio.Event and resolves when the local
queue consumer POSTs /release on the internal control port (127.0.0.1:18999
by default). This produces a 200 success in metrics instead of the 499
cancellation you got from disconnecting the client. The duration cap stays
as a safety net for stuck consumers.
The internal aiohttp server is now unconditional and hosts /release always;
the stub /health route is added only when BACKEND_HEALTH_URL is unset.
NULL_STUB_HEALTH_PORT is renamed to NULL_CONTROL_PORT to reflect the
broader role.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-11 16:59:46 +01:00
Rob Ballantyne
89761b378a
Wire null pyworker healthcheck to a stub (and optional user URL)
...
Adds an in-process aiohttp stub on 127.0.0.1:18999/health so the framework's
periodic healthcheck has something live to talk to. Operators can override
with BACKEND_HEALTH_URL to point at their queue consumer's /health
endpoint, so the autoscaler marks the worker errored if the consumer dies.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-11 16:53:26 +01:00
Rob Ballantyne
18974873e5
Add null pyworker for queue-driven autoscaling
...
A PyWorker that does not forward to any model server. POST /reserve holds
the worker busy until the client disconnects (or the duration cap elapses),
so users with their own job queue can drive Vast autoscaling without
exposing inbound model traffic on the instance.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com >
2026-05-11 16:48:52 +01:00
Lucas Armand
9bc9ba11c5
Increase TGI benchmark tokens to 500
2026-04-30 14:04:39 -07:00
Lucas Armand
e02f4bc943
Lowered concurrency of vLLM and TGI benchmarks
2025-12-17 11:55:33 -08:00
Lucas Armand
bcb04b9a32
add missing comma
2025-12-17 11:40:40 -08:00
Lucas Armand
9daf171487
Increase queue limits for vLLM and TGI
2025-12-17 11:38:55 -08:00
LucasArmandVast
29f836eb1a
Backwards compatible vLLM payload ( #75 )
...
* Support old vLLM payloads
2025-12-15 19:58:02 -08:00
LucasArmandVast
4380d98c01
Use PyWorker SDK ( #67 )
...
* Change PyWorker to Worker SDK
* Moved /lib to vast-sdk (https://github.com/vast-ai/vast-sdk )
2025-12-15 19:33:03 -08:00
Colter Downing
222ac2a0dd
default endpoint name
2025-12-04 10:54:55 -08:00
Colter Downing
40aed9b5f8
adding s3 as an option
2025-12-04 10:52:57 -08:00
Colter Downing
d4d36bf86e
done with comfy updates
2025-12-03 20:45:55 -08:00
Colter Downing
e839cfc6e8
include view in API wrapper
2025-12-03 20:22:45 -08:00
Colter Downing
f04138e13b
update to be able to get images
2025-12-03 20:16:25 -08:00
Colter Downing
6b5b1341a7
update tgi client
2025-12-03 18:38:42 -08:00
Colter-Downing
8be92c03de
Merge pull request #69 from vast-ai/AUTO-874--fix-openai-worker-client
...
defaults to ENDPOINT_NAME and DEFAULT_MODEL but uses the flag first
2025-12-03 16:59:56 -08:00
Colter Downing
adedb8ba90
defaults to ENDPOINT_NAME and DEFAULT_MODEL but uses the flag first if present
2025-12-03 16:57:28 -08:00
Lucas Armand
0bcd2219ea
Increase model wait time for vLLM
2025-12-03 12:38:52 -08:00
Lucas Armand
e0449cb3c7
add llama log
2025-11-21 10:22:16 -08:00
Lucas Armand
3adec1826d
minor changes
2025-11-11 17:11:38 -08:00
Lucas Armand
b55bfa9611
Updated clients, include vastai-sdk, handle non-UTF-8
2025-11-11 17:09:28 -08:00
Abiola Akinnubi
2cde573c56
Merge pull request #48 from vast-ai/comfy-request-idx
...
Added request_idx to comfy auth_data
2025-10-30 11:27:35 -07:00
Abiola Akinnubi
944f83fc03
Removed extra spaces from operator assignment
2025-10-28 21:03:52 +00:00
LucasArmandVast
d6a6e34c6b
Merge branch 'main' into new-pyworker
2025-10-27 12:43:49 -07:00
Abiola Akinnubi
f56bbc0ebe
Added request_idx to comfy auth_data
2025-10-27 03:17:06 +00:00
Colter Downing
bcecd6df40
Suppress matplot debug logs
2025-10-25 16:18:02 -07:00
Rob Ballantyne
f4f7080df1
Re-add comment
2025-10-23 17:00:28 +01:00
Rob Ballantyne
d51a338e8f
log when benchmark file not used
2025-10-23 16:41:02 +01:00
Rob Ballantyne
92a04bd7af
No silent fail if benchmark file is missing
2025-10-23 13:41:03 +01:00
Rob Ballantyne
ec25dda3ad
Merge branch 'vast-ai:main' into feat/comfyui-json-benchmark-workflow-from-file
2025-10-08 14:49:32 +01:00
Rob Ballantyne
4fdc314fd9
Fix healthcheck endpoint URL
2025-10-06 22:16:09 +01:00
Rob Ballantyne
3786cf978d
Add awareness of errors thrown by the provisioning script
2025-10-05 23:14:59 +01:00
Rob Ballantyne
a86d4bcf9c
Import json
2025-10-05 23:05:33 +01:00
Rob Ballantyne
e9b6a14a5e
Import Path
2025-10-05 22:59:19 +01:00
Rob Ballantyne
cadac033e1
Enables use of custom workflow for benchmarking
...
Retains existing method is misc/benchmark.json is nopt present
2025-10-05 22:53:22 +01:00
abiola-vastai
38782d89bc
undo the fix for comfy yesterday.
2025-09-03 17:12:35 +00:00
abiola-vastai
b20d9e714c
Blind hotfix to see if comfy UI default is needed. if it does work we would revert back.
2025-09-03 01:20:09 +00:00
Rob Ballantyne
b8377c4081
Set cost to 100
2025-08-28 16:13:17 +01:00
Rob Ballantyne
703435d10e
Improve MODEL_SERVER_START_* messages
2025-08-26 12:42:04 +01:00
Rob Ballantyne
947fc5eea4
Improve benchmarking explanation
2025-08-26 12:41:30 +01:00
Rob Ballantyne
7c1a544b19
Improve error reporting when no ready workers
2025-08-26 12:41:05 +01:00
Rob Ballantyne
16b414676e
Use count_workload() function for cost
2025-08-25 18:31:10 +01:00
Rob Ballantyne
ba74ac8136
Use cost value 1 for all jobs
2025-08-25 17:58:22 +01:00
Rob Ballantyne
92ff412679
Use MODEL_SERVER_URL environment variable
2025-08-25 17:57:32 +01:00
Rob Ballantyne
fc75a64684
Use MODEL_SERVER_URL environment variable
2025-08-25 17:56:27 +01:00
Rob Ballantyne
3f4acb29fa
Improved client exception handling
2025-08-22 15:20:15 +01:00
Rob Ballantyne
58b078f908
Fix modifier class
2025-08-20 18:06:02 +01:00
Rob Ballantyne
f9fdf04884
Fix signature
2025-08-20 13:27:29 +01:00
Rob Ballantyne
636f17d27f
Fix workflow modifier class
2025-08-20 09:57:07 +01:00