comfyui-json: key readiness off api-wrapper's BACKENDS_READY token

Rather than tailing for "Uvicorn running on", which only confirms the api-wrapper's own HTTP listener is bound, watch for the api-wrapper's new structured tokens that reflect actual end-to-end reachability: MODEL_LOAD_LOG_MSG = ["BACKENDS_READY"] MODEL_ERROR_LOG_MSGS includes: - "BACKENDS_READY_TIMEOUT" (backends never came up) - "BACKEND_UNRECOVERABLE" (CUDA fault latched on a backend) - "Application startup failed" (kept; uvicorn's own ASGI failure) Closes the race observed on a live test where the pyworker fired benchmark the moment uvicorn bound, every request inside the api-wrapper hit Cannot-connect-to-host on ComfyUI, and the SDK counted the resulting fast 502s as a fast worker (perf=200). Tokens are emitted by ai-dock/comfyui-api-wrapper#11 and onward; earlier wrapper versions won't emit BACKENDS_READY so warm-up stalls indefinitely — pin to a wrapper that includes that change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 09:46:45 +01:00
parent a5bcc3de5e
commit b52c654f09
1 changed files with 38 additions and 22 deletions
@@ -33,44 +33,60 @@ from pathlib import Path
 from vastai import Worker, WorkerConfig, HandlerConfig, LogActionConfig, BenchmarkConfig
-# ComfyUI model configuration. The model server here is the ai-dock
+# ComfyUI model configuration. The model server is ai-dock's
 # comfyui-api-wrapper sitting in front of ComfyUI itself, not ComfyUI's
-# own port (18188). We watch the api-wrapper's log rather than ComfyUI's
+# own port (18188). We tail the api-wrapper's log rather than ComfyUI's
-# because the api-wrapper runs convert-workflows.sh before launching
+# and key off the api-wrapper's own structured readiness/fault signals:
-# uvicorn — by the time uvicorn logs "Uvicorn running on ...", the
+#
-# benchmark workflows are converted, the pyworker_benchmark.json symlink
+#   BACKENDS_READY            — api-wrapper has confirmed every ComfyUI
-# exists, and :18288 is accepting connections. Watching ComfyUI's log
+#                               backend passes HTTP+WS probes. Until
-# fires the benchmark too early (before the api-wrapper is reachable),
+#                               this fires, posting to /generate/sync
-# which the SDK can't recover from since __call_backend doesn't retry
+#                               can hit "Cannot connect to host" inside
-# connection-refused.
+#                               the api-wrapper, which the SDK can't
 #                               recover from since __call_backend
 #                               doesn't retry connection-refused.
 #   BACKENDS_READY_TIMEOUT    — backends never reachable within
 #                               api-wrapper's deadline. Worker is
 #                               unrecoverable; mark errored.
 #   BACKEND_UNRECOVERABLE     — CUDA fault / illegal memory access on a
 #                               backend's GPU. Same fate.
 #   Application startup failed — uvicorn's own ASGI lifespan failed.
 #
 # These tokens are emitted by ai-dock/comfyui-api-wrapper >= the
 # "feat/backend-readiness-log-signals" change. Older wrappers won't
 # emit BACKENDS_READY, so warm-up will stall — pin the wrapper version
 # accordingly.
 MODEL_SERVER_URL           = 'http://127.0.0.1'
 MODEL_SERVER_PORT          = 18288
 MODEL_LOG_FILE             = '/var/log/portal/api-wrapper.log'
 MODEL_HEALTHCHECK_ENDPOINT = "/health"
-# api-wrapper log messages
+# Trigger benchmark only after the full stack (api-wrapper + ComfyUI
 # backends) is reachable. See BACKENDS_READY in the comment above.
 MODEL_LOAD_LOG_MSG = [
-    "Uvicorn running on"
+    "BACKENDS_READY",
 ]
-# LogAction.ModelError is fatal: the SDK calls backend_errored() and the
+# LogAction.ModelError is fatal: the SDK calls backend_errored() and
-# worker is locked into a permanent error state. Patterns must therefore
+# locks the worker into a permanent error state. Patterns must
-# only match conditions where the api-wrapper genuinely cannot serve any
+# therefore only match conditions where the api-wrapper genuinely
-# request — supervisord restarts on uvicorn exit, so a real failure
+# cannot serve any request — supervisord restarts on uvicorn exit, so
-# self-heals rather than dragging the worker down.
+# a real failure self-heals rather than dragging the worker down.
 #
 # Notably *not* matched here:
 #   - per-request errors (PreprocessWorker failures, ComfyUI workflow
 #     validation, "Value not in list:") — one malformed client payload
 #     would otherwise kill the worker
-#   - "CUDA out of memory" — surfaces both as misconfigured GPU (which
+#   - "CUDA out of memory" — surfaces both as a misconfigured GPU
-#     the benchmark-failure path already catches via backend_errored)
+#     (which the benchmark-failure path already catches via
-#     and as a too-greedy client request, which is indistinguishable
+#     backend_errored) and as a too-greedy client request, which is
-#     from a substring match
+#     indistinguishable from a substring match
 #   - convert-workflows.sh warnings — that script is not load-bearing
-#     for serving (uvicorn starts even if conversion partially failed)
+#     for serving
 MODEL_ERROR_LOG_MSGS = [
-    "Application startup failed",  # uvicorn ASGI lifespan startup failed -> uvicorn exits
+    "BACKENDS_READY_TIMEOUT",       # backends never reachable
    "BACKEND_UNRECOVERABLE",        # CUDA fault latched per backend
    "Application startup failed",   # uvicorn ASGI lifespan startup failed
 ]
 # LogAction.Info is purely informational (echoes log lines into the vast