Revert default session cost to 100; document the over-provision as a workaround

cost = max_perf = 100 is the intended steady-state semantics: one
session = one worker, scaling elastically from zero. Reverting the
default so the design reads correctly even where current autoscaler
bugs make it misbehave (2→3 scale-up not firing reliably,
scale-to-zero issues — fixes pending on the Vast side).

README now describes the intended model first (clean unit occupancy,
scale-to-zero via inactivity_timeout + min_load=0), then flags the
known autoscaler quirk and presents --session-cost 200 as a temporary
band-aid until the Vast fixes land.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Rob Ballantyne
2026-05-12 11:34:52 +01:00
parent 1d2caaf554
commit 34fd21e76a
2 changed files with 47 additions and 41 deletions
+6 -6
View File
@@ -15,12 +15,12 @@ logging.basicConfig(
log = logging.getLogger(__file__)
ENDPOINT_NAME = "null-prod"
# Default cost passed to /session/create. Bumping this above the worker's
# max_perf (100) is how you tell the autoscaler "each session is more than
# one worker of work" — keeps an extra active worker warm and ready, so
# the next /session/create lands on a free worker instead of queueing.
# See README "Endpoint scaling parameters" for the math.
DEFAULT_SESSION_COST = 200
# Default cost passed to /session/create. 100 matches the worker's
# max_perf for clean unit-occupancy semantics: one session = one worker.
# If you hit autoscaler scale-up issues (queueing past the 2nd active
# worker), --session-cost 200 is a temporary over-provisioning workaround
# until the known autoscaler fixes land.
DEFAULT_SESSION_COST = 100
async def reserve(