pyworker

Author	SHA1	Message	Date
Rob Ballantyne	6c2f194b28	Add perf heartbeat to keep null pyworker reporting peak throughput While a /reserve is held, no requests complete so workload_served stays at 0 each metrics tick. The autoscaler sees cur_perf=0 against max_perf=150, concludes the worker can't deliver claimed throughput, downgrades it, and gets cautious about scaling up — so additional /reserve requests pile up behind the held one instead of triggering a new worker. Add a 1Hz heartbeat coroutine that, while anything is in flight, sets workload_served back to TARGET_PERF (150) and flags update_pending. The metrics tick reads 150 and resets to 0; the heartbeat re-pins it before the next tick. Net effect: the autoscaler sees a saturated worker delivering at peak rate, which is the signal it needs to scale a new worker up rather than queue. The heartbeat needs the backend instance, which is only created inside Worker(...) — stash a reference in a module-level dict between Worker() and .run() so the lifecycle coroutine can reach it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 10:35:18 +01:00
Rob Ballantyne	2aada7b210	Add --plateau to null pyworker demo (default 5min) Previously the first release fired only 30s after the third reservation started, so the autoscaler often hadn't even finished provisioning the third worker yet. Default plateau to 300s so all three workers are visibly running before scale-down begins; configurable via --plateau. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:26:31 +01:00
Rob Ballantyne	8df562e243	Standardize null pyworker load/perf on 150 Bump workload_calculator, benchmark cache value, and client cost from 100 to 150. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:17:57 +01:00
Rob Ballantyne	4eef5e22af	Pin null pyworker max_throughput to exactly 100 asyncio.sleep(1.0) takes slightly more than 1s due to event loop scheduling, so workload/time landed at ~99.x instead of 100. Pre-populate the framework's .has_benchmark cache file with "100" before the benchmark runs — __run_benchmark short-circuits to the cached value and skips the time-based calculation entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:13:16 +01:00
Rob Ballantyne	9d969e376e	Standardize null pyworker load/perf on 100 Using 1 confused the serverless capacity math. Set workload_calculator, benchmark target throughput, and client cost all to 100 — the conventional default the rest of the system expects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:09:16 +01:00
Rob Ballantyne	ef3f34a515	Restructure null pyworker --demo as a clean trapezoid Three reservations 30s apart, each with a 90s duration. They end one at a time, also 30s apart, then the client exits. Each reservation ends via its duration cap (200 success) rather than the previous "cancel one, leave two open" pattern that left two 499s pending. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:00:46 +01:00
Rob Ballantyne	147bf2597a	Set null pyworker client cost to 1 Match the server-side workload_calculator (1.0) so the autoscaler routing hint is consistent with what the worker reports. A null reservation is a unitless slot — no reason for client cost to be 100. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:47:19 +01:00
Rob Ballantyne	dc423e2999	Pin null pyworker benchmark to ~1.0 throughput The startup benchmark previously returned instantly, producing max_throughput around 339895. A null worker has no real throughput concept (each reservation is a unitless slot), so sleep 1s during the benchmark with workload=1 to record max_throughput ~= 1.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:22:45 +01:00
Rob Ballantyne	463f3de8ea	Add staggered --demo mode to null pyworker client Three concurrent /reserve calls 30s apart, then cancel the first to show the early-release path. The remaining two run until their duration cap. Useful for watching scale-up/scale-down behaviour in the autoscaler dashboard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:08:44 +01:00
Rob Ballantyne	ed0db198c3	Reject queued /reserve immediately on busy null workers A held reservation runs for up to MAX_RESERVATION_SECONDS (default 1h), so queueing a second /reserve behind it makes no sense — the wait would dwarf any sane timeout. Set max_queue_time=0.0 so the framework rejects 429 as soon as another reservation is in flight, and serverless routes the request to a free worker or scales a new one up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:05:02 +01:00
Rob Ballantyne	3668d948be	Simplify null pyworker README intro to serverless terminology Drop the "autoscaler provisions a worker if none is free" phrasing in favor of the simpler "request comes in and you get a worker; release and it scales back down." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:02:41 +01:00
Rob Ballantyne	254ccdf181	Add /release control endpoint to null pyworker The held /reserve now waits on an asyncio.Event and resolves when the local queue consumer POSTs /release on the internal control port (127.0.0.1:18999 by default). This produces a 200 success in metrics instead of the 499 cancellation you got from disconnecting the client. The duration cap stays as a safety net for stuck consumers. The internal aiohttp server is now unconditional and hosts /release always; the stub /health route is added only when BACKEND_HEALTH_URL is unset. NULL_STUB_HEALTH_PORT is renamed to NULL_CONTROL_PORT to reflect the broader role. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 16:59:46 +01:00
Rob Ballantyne	89761b378a	Wire null pyworker healthcheck to a stub (and optional user URL) Adds an in-process aiohttp stub on 127.0.0.1:18999/health so the framework's periodic healthcheck has something live to talk to. Operators can override with BACKEND_HEALTH_URL to point at their queue consumer's /health endpoint, so the autoscaler marks the worker errored if the consumer dies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 16:53:26 +01:00
Rob Ballantyne	18974873e5	Add null pyworker for queue-driven autoscaling A PyWorker that does not forward to any model server. POST /reserve holds the worker busy until the client disconnects (or the duration cap elapses), so users with their own job queue can drive Vast autoscaling without exposing inbound model traffic on the instance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 16:48:52 +01:00
Lucas Armand	9bc9ba11c5	Increase TGI benchmark tokens to 500	2026-04-30 14:04:39 -07:00
Lucas Armand	e02f4bc943	Lowered concurrency of vLLM and TGI benchmarks	2025-12-17 11:55:33 -08:00
Lucas Armand	bcb04b9a32	add missing comma	2025-12-17 11:40:40 -08:00
Lucas Armand	9daf171487	Increase queue limits for vLLM and TGI	2025-12-17 11:38:55 -08:00
LucasArmandVast	29f836eb1a	Backwards compatible vLLM payload (#75 ) * Support old vLLM payloads	2025-12-15 19:58:02 -08:00
LucasArmandVast	4380d98c01	Use PyWorker SDK (#67 ) * Change PyWorker to Worker SDK * Moved /lib to vast-sdk (https://github.com/vast-ai/vast-sdk)	2025-12-15 19:33:03 -08:00
Colter Downing	222ac2a0dd	default endpoint name	2025-12-04 10:54:55 -08:00
Colter Downing	40aed9b5f8	adding s3 as an option	2025-12-04 10:52:57 -08:00
Colter Downing	d4d36bf86e	done with comfy updates	2025-12-03 20:45:55 -08:00
Colter Downing	e839cfc6e8	include view in API wrapper	2025-12-03 20:22:45 -08:00
Colter Downing	f04138e13b	update to be able to get images	2025-12-03 20:16:25 -08:00
Colter Downing	6b5b1341a7	update tgi client	2025-12-03 18:38:42 -08:00
Colter-Downing	8be92c03de	Merge pull request #69 from vast-ai/AUTO-874--fix-openai-worker-client defaults to ENDPOINT_NAME and DEFAULT_MODEL but uses the flag first	2025-12-03 16:59:56 -08:00
Colter Downing	adedb8ba90	defaults to ENDPOINT_NAME and DEFAULT_MODEL but uses the flag first if present	2025-12-03 16:57:28 -08:00
Lucas Armand	0bcd2219ea	Increase model wait time for vLLM	2025-12-03 12:38:52 -08:00
Lucas Armand	e0449cb3c7	add llama log	2025-11-21 10:22:16 -08:00
Lucas Armand	3adec1826d	minor changes	2025-11-11 17:11:38 -08:00
Lucas Armand	b55bfa9611	Updated clients, include vastai-sdk, handle non-UTF-8	2025-11-11 17:09:28 -08:00
Abiola Akinnubi	2cde573c56	Merge pull request #48 from vast-ai/comfy-request-idx Added request_idx to comfy auth_data	2025-10-30 11:27:35 -07:00
Abiola Akinnubi	944f83fc03	Removed extra spaces from operator assignment	2025-10-28 21:03:52 +00:00
LucasArmandVast	d6a6e34c6b	Merge branch 'main' into new-pyworker	2025-10-27 12:43:49 -07:00
Abiola Akinnubi	f56bbc0ebe	Added request_idx to comfy auth_data	2025-10-27 03:17:06 +00:00
Colter Downing	bcecd6df40	Suppress matplot debug logs	2025-10-25 16:18:02 -07:00
Rob Ballantyne	f4f7080df1	Re-add comment	2025-10-23 17:00:28 +01:00
Rob Ballantyne	d51a338e8f	log when benchmark file not used	2025-10-23 16:41:02 +01:00
Rob Ballantyne	92a04bd7af	No silent fail if benchmark file is missing	2025-10-23 13:41:03 +01:00
Rob Ballantyne	ec25dda3ad	Merge branch 'vast-ai:main' into feat/comfyui-json-benchmark-workflow-from-file	2025-10-08 14:49:32 +01:00
Rob Ballantyne	4fdc314fd9	Fix healthcheck endpoint URL	2025-10-06 22:16:09 +01:00
Rob Ballantyne	3786cf978d	Add awareness of errors thrown by the provisioning script	2025-10-05 23:14:59 +01:00
Rob Ballantyne	a86d4bcf9c	Import json	2025-10-05 23:05:33 +01:00
Rob Ballantyne	e9b6a14a5e	Import Path	2025-10-05 22:59:19 +01:00
Rob Ballantyne	cadac033e1	Enables use of custom workflow for benchmarking Retains existing method is misc/benchmark.json is nopt present	2025-10-05 22:53:22 +01:00
abiola-vastai	38782d89bc	undo the fix for comfy yesterday.	2025-09-03 17:12:35 +00:00
abiola-vastai	b20d9e714c	Blind hotfix to see if comfy UI default is needed. if it does work we would revert back.	2025-09-03 01:20:09 +00:00
Rob Ballantyne	b8377c4081	Set cost to 100	2025-08-28 16:13:17 +01:00
Rob Ballantyne	703435d10e	Improve MODEL_SERVER_START_* messages	2025-08-26 12:42:04 +01:00

1 2

82 Commits