pyworker

Author	SHA1	Message	Date
Rob Ballantyne	a81d3febe7	Collapse null pyworker client to a single mode parameterized by --count Now that the session model means no HTTP connection is held during the reservation, the dichotomy between "single reserve" and "trapezoid demo" collapses — both are "open N sessions, each held for H seconds, started I seconds apart, close." Replace --reserve/--demo/--duration/--plateau with --count/--hold/--interval. --session-cost becomes --cost. Client is now 64 lines (down from 120). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 12:18:33 +01:00
Rob Ballantyne	913e3a8782	Simplify null pyworker code and docs Pass over all three files to drop verbose expository commentary that duplicated either the code or the README. Net: -284 lines. README now reads top-to-bottom in roughly the order someone would need the info: use case → how it works → endpoint params → API → healthcheck → deploy → demo. Endpoint params table uses the values actually tested on alpha (min_load=0, target_util=1, max_queue_time=1, target_queue_time=0.5, inactivity_timeout=10). Dropped the "known autoscaler quirk" section now that alpha addresses it; kept the --session-cost flag as a debugging knob. worker.py and client.py keep the same behavior but trim long block comments and multi-line docstrings the code didn't need. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:50:03 +01:00
Rob Ballantyne	47ad0ebe0a	Add --instance flag to null pyworker client Lets the demo target run-alpha.vast.ai (or candidate/local) without editing code. Defaults to prod; respects VAST_INSTANCE env var. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:40:51 +01:00
Rob Ballantyne	34fd21e76a	Revert default session cost to 100; document the over-provision as a workaround cost = max_perf = 100 is the intended steady-state semantics: one session = one worker, scaling elastically from zero. Reverting the default so the design reads correctly even where current autoscaler bugs make it misbehave (2→3 scale-up not firing reliably, scale-to-zero issues — fixes pending on the Vast side). README now describes the intended model first (clean unit occupancy, scale-to-zero via inactivity_timeout + min_load=0), then flags the known autoscaler quirk and presents --session-cost 200 as a temporary band-aid until the Vast fixes land. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:34:52 +01:00
Rob Ballantyne	1d2caaf554	Default null pyworker session cost to 2x max_perf Reporting cost == max_perf puts an occupied worker at exactly 100% utilization, which the autoscaler reads as "at target, no action." The 3rd session_create then 429s on both active workers and stalls in the global queue instead of triggering a cold-worker activation (observed: 1→2 active scales fine, 2→3 does not). Bumping cost to 2 * max_perf makes each session look like more than one worker's work, so the autoscaler always keeps an extra active worker hot. Slight over-provisioning, but the 3rd reservation lands directly on a free worker rather than queueing. Expose --session-cost on the client so the value can be swept without edits. README documents the trade-off. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:31:26 +01:00
Rob Ballantyne	d51f04a176	Await endpoint.session() in null pyworker client endpoint.session() forwards to start_endpoint_session, which is async def — so the call returns a coroutine, not a Session, despite the SDK's return-type annotation. Use 'async with await endpoint.session(...)' so the coroutine resolves to a Session before entering the context. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 11:07:32 +01:00
Rob Ballantyne	6a562a1376	Rewrite null pyworker on the framework session model Drop the held-/reserve approach in favour of the framework's session primitive (max_sessions=1 + /session/create). Sessions are excluded from the autoscaler's queue-wait math and don't suffer the cur_perf=0 degradation that a long-held request did, so this naturally produces the "one request comes in and you get a worker; release and it scales back down" model we were hand-rolling. Server side: - max_sessions=1; framework auto-registers /session/* routes - Drop custom /reserve handler, _active_reservation event, max_queue_ time=0.0, MAX_RESERVATION_SECONDS, _perf_heartbeat - Trivial /ping handler exists only to satisfy the framework's "at least one handler with BenchmarkConfig" requirement (and to give clients an extension/keepalive route) - /release on the internal control port is kept as a convenience for queue consumers that don't carry session_auth — calls the framework's __close_session via name-mangling, which bypasses the session_auth check but is fine for a localhost-only endpoint - Workload/perf back to 100 (conventional) Client side: - Uses endpoint.session(cost, lifetime) instead of POST /reserve - async with the SDK Session; close on exit posts /session/end with proper auth → 200 success in metrics - Demo and single modes both ride the same reserve() helper Sessions landed in vastai-sdk 0.4.2 (commit ec9ef59, 2026-01-20). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-12 10:51:24 +01:00
Rob Ballantyne	2aada7b210	Add --plateau to null pyworker demo (default 5min) Previously the first release fired only 30s after the third reservation started, so the autoscaler often hadn't even finished provisioning the third worker yet. Default plateau to 300s so all three workers are visibly running before scale-down begins; configurable via --plateau. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:26:31 +01:00
Rob Ballantyne	8df562e243	Standardize null pyworker load/perf on 150 Bump workload_calculator, benchmark cache value, and client cost from 100 to 150. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:17:57 +01:00
Rob Ballantyne	9d969e376e	Standardize null pyworker load/perf on 100 Using 1 confused the serverless capacity math. Set workload_calculator, benchmark target throughput, and client cost all to 100 — the conventional default the rest of the system expects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:09:16 +01:00
Rob Ballantyne	ef3f34a515	Restructure null pyworker --demo as a clean trapezoid Three reservations 30s apart, each with a 90s duration. They end one at a time, also 30s apart, then the client exits. Each reservation ends via its duration cap (200 success) rather than the previous "cancel one, leave two open" pattern that left two 499s pending. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 18:00:46 +01:00
Rob Ballantyne	147bf2597a	Set null pyworker client cost to 1 Match the server-side workload_calculator (1.0) so the autoscaler routing hint is consistent with what the worker reports. A null reservation is a unitless slot — no reason for client cost to be 100. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:47:19 +01:00
Rob Ballantyne	463f3de8ea	Add staggered --demo mode to null pyworker client Three concurrent /reserve calls 30s apart, then cancel the first to show the early-release path. The remaining two run until their duration cap. Useful for watching scale-up/scale-down behaviour in the autoscaler dashboard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 17:08:44 +01:00
Rob Ballantyne	18974873e5	Add null pyworker for queue-driven autoscaling A PyWorker that does not forward to any model server. POST /reserve holds the worker busy until the client disconnects (or the duration cap elapses), so users with their own job queue can drive Vast autoscaling without exposing inbound model traffic on the instance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-11 16:48:52 +01:00

14 Commits