pyworker

Author	SHA1	Message	Date
Rob Ballantyne	1d2caaf554	Default null pyworker session cost to 2x max_perf Reporting cost == max_perf puts an occupied worker at exactly 100% utilization, which the autoscaler reads as "at target, no action." The 3rd session_create then 429s on both active workers and stalls in the global queue instead of triggering a cold-worker activation (observed: 1→2 active scales fine, 2→3 does not). Bumping cost to 2 * max_perf makes each session look like more than one worker's work, so the autoscaler always keeps an extra active worker hot. Slight over-provisioning, but the 3rd reservation lands directly on a free worker rather than queueing. Expose --session-cost on the client so the value can be swept without edits. README documents the trade-off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:31:26 +01:00
Rob Ballantyne	01eff874d8	Correct queue-time guidance for null pyworker endpoints Earlier note claimed max_queue_time / target_queue_time were no-ops because the worker's internal wait_time property filters sessions out. That filter only affects per-worker rejection on a given handler — the autoscaler doesn't see the property and computes its own queue-time estimate from cur_load / max_perf, which does include sessions. With defaults around 30s, an occupied null worker (cur_load=100, max_perf=100, implied queue=1s) still looks "available" to the autoscaler, so a third reservation gets queued on an existing worker via repeated 429-retries instead of triggering scale-up. Fix: set max_queue_time = 0 and target_queue_time = 0 on the endpoint. Any in-flight load marks the worker "full" for routing, and any observed queue time triggers immediate scale-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:14:20 +01:00
Rob Ballantyne	d51f04a176	Await endpoint.session() in null pyworker client endpoint.session() forwards to start_endpoint_session, which is async def — so the call returns a coroutine, not a Session, despite the SDK's return-type annotation. Use 'async with await endpoint.session(...)' so the coroutine resolves to a Session before entering the context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:07:32 +01:00
Rob Ballantyne	ef248ef695	Document endpoint scaling parameters for null pyworker Add a scaling-parameters section to the README covering target_util=1.0 (the critical one — the default 0.9 silently rounds up to one extra worker), min_load math, and why max_queue_time / target_queue_time don't matter here (sessions are filtered from wait_time so both signals stay at zero). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:06:04 +01:00
Rob Ballantyne	6a562a1376	Rewrite null pyworker on the framework session model Drop the held-/reserve approach in favour of the framework's session primitive (max_sessions=1 + /session/create). Sessions are excluded from the autoscaler's queue-wait math and don't suffer the cur_perf=0 degradation that a long-held request did, so this naturally produces the "one request comes in and you get a worker; release and it scales back down" model we were hand-rolling. Server side: - max_sessions=1; framework auto-registers /session/* routes - Drop custom /reserve handler, _active_reservation event, max_queue_ time=0.0, MAX_RESERVATION_SECONDS, _perf_heartbeat - Trivial /ping handler exists only to satisfy the framework's "at least one handler with BenchmarkConfig" requirement (and to give clients an extension/keepalive route) - /release on the internal control port is kept as a convenience for queue consumers that don't carry session_auth — calls the framework's __close_session via name-mangling, which bypasses the session_auth check but is fine for a localhost-only endpoint - Workload/perf back to 100 (conventional) Client side: - Uses endpoint.session(cost, lifetime) instead of POST /reserve - async with the SDK Session; close on exit posts /session/end with proper auth → 200 success in metrics - Demo and single modes both ride the same reserve() helper Sessions landed in vastai-sdk 0.4.2 (commit ec9ef59, 2026-01-20). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 10:51:24 +01:00
Rob Ballantyne	6c2f194b28	Add perf heartbeat to keep null pyworker reporting peak throughput While a /reserve is held, no requests complete so workload_served stays at 0 each metrics tick. The autoscaler sees cur_perf=0 against max_perf=150, concludes the worker can't deliver claimed throughput, downgrades it, and gets cautious about scaling up — so additional /reserve requests pile up behind the held one instead of triggering a new worker. Add a 1Hz heartbeat coroutine that, while anything is in flight, sets workload_served back to TARGET_PERF (150) and flags update_pending. The metrics tick reads 150 and resets to 0; the heartbeat re-pins it before the next tick. Net effect: the autoscaler sees a saturated worker delivering at peak rate, which is the signal it needs to scale a new worker up rather than queue. The heartbeat needs the backend instance, which is only created inside Worker(...) — stash a reference in a module-level dict between Worker() and .run() so the lifecycle coroutine can reach it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 10:35:18 +01:00
Rob Ballantyne	2aada7b210	Add --plateau to null pyworker demo (default 5min) Previously the first release fired only 30s after the third reservation started, so the autoscaler often hadn't even finished provisioning the third worker yet. Default plateau to 300s so all three workers are visibly running before scale-down begins; configurable via --plateau. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:26:31 +01:00
Rob Ballantyne	8df562e243	Standardize null pyworker load/perf on 150 Bump workload_calculator, benchmark cache value, and client cost from 100 to 150. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:17:57 +01:00
Rob Ballantyne	4eef5e22af	Pin null pyworker max_throughput to exactly 100 asyncio.sleep(1.0) takes slightly more than 1s due to event loop scheduling, so workload/time landed at ~99.x instead of 100. Pre-populate the framework's .has_benchmark cache file with "100" before the benchmark runs — __run_benchmark short-circuits to the cached value and skips the time-based calculation entirely. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:13:16 +01:00
Rob Ballantyne	9d969e376e	Standardize null pyworker load/perf on 100 Using 1 confused the serverless capacity math. Set workload_calculator, benchmark target throughput, and client cost all to 100 — the conventional default the rest of the system expects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:09:16 +01:00
Rob Ballantyne	ef3f34a515	Restructure null pyworker --demo as a clean trapezoid Three reservations 30s apart, each with a 90s duration. They end one at a time, also 30s apart, then the client exits. Each reservation ends via its duration cap (200 success) rather than the previous "cancel one, leave two open" pattern that left two 499s pending. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:00:46 +01:00
Rob Ballantyne	147bf2597a	Set null pyworker client cost to 1 Match the server-side workload_calculator (1.0) so the autoscaler routing hint is consistent with what the worker reports. A null reservation is a unitless slot — no reason for client cost to be 100. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:47:19 +01:00
Rob Ballantyne	dc423e2999	Pin null pyworker benchmark to ~1.0 throughput The startup benchmark previously returned instantly, producing max_throughput around 339895. A null worker has no real throughput concept (each reservation is a unitless slot), so sleep 1s during the benchmark with workload=1 to record max_throughput ~= 1.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:22:45 +01:00
Rob Ballantyne	463f3de8ea	Add staggered --demo mode to null pyworker client Three concurrent /reserve calls 30s apart, then cancel the first to show the early-release path. The remaining two run until their duration cap. Useful for watching scale-up/scale-down behaviour in the autoscaler dashboard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:08:44 +01:00
Rob Ballantyne	ed0db198c3	Reject queued /reserve immediately on busy null workers A held reservation runs for up to MAX_RESERVATION_SECONDS (default 1h), so queueing a second /reserve behind it makes no sense — the wait would dwarf any sane timeout. Set max_queue_time=0.0 so the framework rejects 429 as soon as another reservation is in flight, and serverless routes the request to a free worker or scales a new one up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:05:02 +01:00
Rob Ballantyne	3668d948be	Simplify null pyworker README intro to serverless terminology Drop the "autoscaler provisions a worker if none is free" phrasing in favor of the simpler "request comes in and you get a worker; release and it scales back down." Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:02:41 +01:00
Rob Ballantyne	254ccdf181	Add /release control endpoint to null pyworker The held /reserve now waits on an asyncio.Event and resolves when the local queue consumer POSTs /release on the internal control port (127.0.0.1:18999 by default). This produces a 200 success in metrics instead of the 499 cancellation you got from disconnecting the client. The duration cap stays as a safety net for stuck consumers. The internal aiohttp server is now unconditional and hosts /release always; the stub /health route is added only when BACKEND_HEALTH_URL is unset. NULL_STUB_HEALTH_PORT is renamed to NULL_CONTROL_PORT to reflect the broader role. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 16:59:46 +01:00
Rob Ballantyne	89761b378a	Wire null pyworker healthcheck to a stub (and optional user URL) Adds an in-process aiohttp stub on 127.0.0.1:18999/health so the framework's periodic healthcheck has something live to talk to. Operators can override with BACKEND_HEALTH_URL to point at their queue consumer's /health endpoint, so the autoscaler marks the worker errored if the consumer dies. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 16:53:26 +01:00
Rob Ballantyne	18974873e5	Add null pyworker for queue-driven autoscaling A PyWorker that does not forward to any model server. POST /reserve holds the worker busy until the client disconnects (or the duration cap elapses), so users with their own job queue can drive Vast autoscaling without exposing inbound model traffic on the instance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 16:48:52 +01:00
Lucas Armand	9bc9ba11c5	Increase TGI benchmark tokens to 500	2026-04-30 14:04:39 -07:00
LucasArmandVast	48fdc65e3d	Update to vastai package (#84 )	2026-04-14 10:41:31 -07:00
LucasArmandVast	2cd97315cd	Add nltk requirement for openai worker (#83 ) * Add nltk requirement for openai worker * pin version	2026-04-13 11:30:06 -07:00
Lucas Armand	83c31e25a9	Add force update detection	2026-03-31 13:46:22 -07:00
Lucas Armand	fbe1dca6fa	more env_path fixes	2026-03-30 16:28:51 -07:00
Lucas Armand	4c3120dbc5	allow override env_path	2026-03-30 16:25:01 -07:00
Lucas Armand	d7d9b915f6	allow break system packages	2026-03-30 16:09:17 -07:00
Lucas Armand	4660b337fb	Check for USE_SYSTEM_PYTHON	2026-03-30 14:46:38 -07:00
edgaratvast	7506ecb6b5	directly invoke one stop shop setup executable exported by vastai pip package for deployments (#82 )	2026-03-26 10:59:49 -07:00
LucasArmandVast	50633c5003	Update deployments script with retries. (#81 )	2026-03-23 14:58:32 -07:00
LucasArmandVast	2e8f18276f	Add beta deployments script (#80 )	2026-03-23 14:14:06 -07:00
Scott Darden	eba9c480eb	Merge pull request #79 from vast-ai/update-requirements Updated requirements to only require vastai-sdk	2026-01-14 12:07:33 -08:00
Lucas Armand	aaca1c9645	Updated requirements to only require vastai-sdk	2026-01-14 10:47:07 -08:00
LucasArmandVast	f319db6bd5	flag for model log rotate (#78 )	2026-01-12 17:03:18 -08:00
LucasArmandVast	4d786b4d17	SDK Versioning Improvements (#77 ) * Add SDK_BRANCH	2026-01-02 10:23:07 -08:00
LucasArmandVast	bd3e0032a1	Add SDK version checking (#76 )	2025-12-17 21:01:52 -08:00
Lucas Armand	e02f4bc943	Lowered concurrency of vLLM and TGI benchmarks	2025-12-17 11:55:33 -08:00
Lucas Armand	bcb04b9a32	add missing comma	2025-12-17 11:40:40 -08:00
Lucas Armand	9daf171487	Increase queue limits for vLLM and TGI	2025-12-17 11:38:55 -08:00
LucasArmandVast	29f836eb1a	Backwards compatible vLLM payload (#75 ) * Support old vLLM payloads	2025-12-15 19:58:02 -08:00
LucasArmandVast	4380d98c01	Use PyWorker SDK (#67 ) * Change PyWorker to Worker SDK * Moved /lib to vast-sdk (https://github.com/vast-ai/vast-sdk)	2025-12-15 19:33:03 -08:00
Abiola Akinnubi	2ce741a8b7	Merge pull request #74 from vast-ai/AUTO-912 Mark pyworkers as "Error" if startup script fails. to avoid silent fail that waits for autoscaler.	2025-12-11 17:05:13 -08:00
Abiola Akinnubi	4ecc07032f	Mark pyworkers as "Error" if startup script fails. to avoid silent fail that waits for autoscaler.	2025-12-11 12:51:56 -08:00
edgaratvast	df61e6e946	correct version pin for aiohttp (#73 ) Co-authored-by: Edgar Lin <edgarlin2000@gmail.com>	2025-12-10 19:34:52 -08:00
LucasArmandVast	70f8a8f534	Merge pull request #72 from vast-ai/hotfix-pin-pycares Hotfix: pin pycares	2025-12-10 20:41:44 -05:00
Lucas Armand	7be8aa6397	pin pycares	2025-12-10 17:38:03 -08:00
Colter-Downing	138fc3ac47	Merge pull request #71 from vast-ai/AUTO-comfyui-updates Auto comfyui updates	2025-12-04 10:55:12 -08:00
Colter Downing	222ac2a0dd	default endpoint name	2025-12-04 10:54:55 -08:00
Colter Downing	40aed9b5f8	adding s3 as an option	2025-12-04 10:52:57 -08:00
Colter Downing	d4d36bf86e	done with comfy updates	2025-12-03 20:45:55 -08:00
Colter Downing	e839cfc6e8	include view in API wrapper	2025-12-03 20:22:45 -08:00

1 2 3 4 5

215 Commits