Revert default session cost to 100; document the over-provision as a workaround
cost = max_perf = 100 is the intended steady-state semantics: one session = one worker, scaling elastically from zero. Reverting the default so the design reads correctly even where current autoscaler bugs make it misbehave (2→3 scale-up not firing reliably, scale-to-zero issues — fixes pending on the Vast side). README now describes the intended model first (clean unit occupancy, scale-to-zero via inactivity_timeout + min_load=0), then flags the known autoscaler quirk and presents --session-cost 200 as a temporary band-aid until the Vast fixes land. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -15,12 +15,12 @@ logging.basicConfig(
|
||||
log = logging.getLogger(__file__)
|
||||
|
||||
ENDPOINT_NAME = "null-prod"
|
||||
# Default cost passed to /session/create. Bumping this above the worker's
|
||||
# max_perf (100) is how you tell the autoscaler "each session is more than
|
||||
# one worker of work" — keeps an extra active worker warm and ready, so
|
||||
# the next /session/create lands on a free worker instead of queueing.
|
||||
# See README "Endpoint scaling parameters" for the math.
|
||||
DEFAULT_SESSION_COST = 200
|
||||
# Default cost passed to /session/create. 100 matches the worker's
|
||||
# max_perf for clean unit-occupancy semantics: one session = one worker.
|
||||
# If you hit autoscaler scale-up issues (queueing past the 2nd active
|
||||
# worker), --session-cost 200 is a temporary over-provisioning workaround
|
||||
# until the known autoscaler fixes land.
|
||||
DEFAULT_SESSION_COST = 100
|
||||
|
||||
|
||||
async def reserve(
|
||||
|
||||
Reference in New Issue
Block a user