Commit Graph

9 Commits

Author SHA1 Message Date
Rob Ballantyne d51f04a176 Await endpoint.session() in null pyworker client
endpoint.session() forwards to start_endpoint_session, which is async
def — so the call returns a coroutine, not a Session, despite the
SDK's return-type annotation. Use 'async with await endpoint.session(...)'
so the coroutine resolves to a Session before entering the context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 11:07:32 +01:00
Rob Ballantyne 6a562a1376 Rewrite null pyworker on the framework session model
Drop the held-/reserve approach in favour of the framework's session
primitive (max_sessions=1 + /session/create). Sessions are excluded from
the autoscaler's queue-wait math and don't suffer the cur_perf=0
degradation that a long-held request did, so this naturally produces the
"one request comes in and you get a worker; release and it scales back
down" model we were hand-rolling.

Server side:
  - max_sessions=1; framework auto-registers /session/* routes
  - Drop custom /reserve handler, _active_reservation event, max_queue_
    time=0.0, MAX_RESERVATION_SECONDS, _perf_heartbeat
  - Trivial /ping handler exists only to satisfy the framework's
    "at least one handler with BenchmarkConfig" requirement (and to give
    clients an extension/keepalive route)
  - /release on the internal control port is kept as a convenience for
    queue consumers that don't carry session_auth — calls the framework's
    __close_session via name-mangling, which bypasses the session_auth
    check but is fine for a localhost-only endpoint
  - Workload/perf back to 100 (conventional)

Client side:
  - Uses endpoint.session(cost, lifetime) instead of POST /reserve
  - async with the SDK Session; close on exit posts /session/end with
    proper auth → 200 success in metrics
  - Demo and single modes both ride the same reserve() helper

Sessions landed in vastai-sdk 0.4.2 (commit ec9ef59, 2026-01-20).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 10:51:24 +01:00
Rob Ballantyne 2aada7b210 Add --plateau to null pyworker demo (default 5min)
Previously the first release fired only 30s after the third reservation
started, so the autoscaler often hadn't even finished provisioning the
third worker yet. Default plateau to 300s so all three workers are
visibly running before scale-down begins; configurable via --plateau.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 18:26:31 +01:00
Rob Ballantyne 8df562e243 Standardize null pyworker load/perf on 150
Bump workload_calculator, benchmark cache value, and client cost from 100
to 150.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 18:17:57 +01:00
Rob Ballantyne 9d969e376e Standardize null pyworker load/perf on 100
Using 1 confused the serverless capacity math. Set workload_calculator,
benchmark target throughput, and client cost all to 100 — the conventional
default the rest of the system expects.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 18:09:16 +01:00
Rob Ballantyne ef3f34a515 Restructure null pyworker --demo as a clean trapezoid
Three reservations 30s apart, each with a 90s duration. They end one at
a time, also 30s apart, then the client exits. Each reservation ends
via its duration cap (200 success) rather than the previous "cancel one,
leave two open" pattern that left two 499s pending.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 18:00:46 +01:00
Rob Ballantyne 147bf2597a Set null pyworker client cost to 1
Match the server-side workload_calculator (1.0) so the autoscaler routing
hint is consistent with what the worker reports. A null reservation is a
unitless slot — no reason for client cost to be 100.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 17:47:19 +01:00
Rob Ballantyne 463f3de8ea Add staggered --demo mode to null pyworker client
Three concurrent /reserve calls 30s apart, then cancel the first to show
the early-release path. The remaining two run until their duration cap.
Useful for watching scale-up/scale-down behaviour in the autoscaler
dashboard.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 17:08:44 +01:00
Rob Ballantyne 18974873e5 Add null pyworker for queue-driven autoscaling
A PyWorker that does not forward to any model server. POST /reserve holds
the worker busy until the client disconnects (or the duration cap elapses),
so users with their own job queue can drive Vast autoscaling without
exposing inbound model traffic on the instance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 16:48:52 +01:00