Revert default session cost to 100; document the over-provision as a workaround

cost = max_perf = 100 is the intended steady-state semantics: one session = one worker, scaling elastically from zero. Reverting the default so the design reads correctly even where current autoscaler bugs make it misbehave (2→3 scale-up not firing reliably, scale-to-zero issues — fixes pending on the Vast side). README now describes the intended model first (clean unit occupancy, scale-to-zero via inactivity_timeout + min_load=0), then flags the known autoscaler quirk and presents --session-cost 200 as a temporary band-aid until the Vast fixes land. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 11:34:52 +01:00
parent 1d2caaf554
commit 34fd21e76a
2 changed files with 47 additions and 41 deletions
@@ -15,12 +15,12 @@ logging.basicConfig(
 log = logging.getLogger(__file__)

 ENDPOINT_NAME = "null-prod"
-# Default cost passed to /session/create. Bumping this above the worker's
-# max_perf (100) is how you tell the autoscaler "each session is more than
-# one worker of work" — keeps an extra active worker warm and ready, so
-# the next /session/create lands on a free worker instead of queueing.
-# See README "Endpoint scaling parameters" for the math.
-DEFAULT_SESSION_COST = 200
+# Default cost passed to /session/create. 100 matches the worker's
+# max_perf for clean unit-occupancy semantics: one session = one worker.
+# If you hit autoscaler scale-up issues (queueing past the 2nd active
+# worker), --session-cost 200 is a temporary over-provisioning workaround
+# until the known autoscaler fixes land.
+DEFAULT_SESSION_COST = 100


 async def reserve(