a5bcc3de5e
Five issues raised by Copilot's review:
1. _resolve_benchmark_path's docstring/README claim that a set-but-
broken BENCHMARK_JSON_PATH falls through to the well-known tier,
but the implementation only handled "file missing". A path
pointing at a directory or holding malformed JSON dropped
straight to the SD1.5 fallback without consulting tier 3.
Replaced with a true tiered try-and-load: walk
(misc, env, well-known), attempt to load each, and fall through
to the next on any failure (missing, not a regular file,
unreadable, invalid JSON). The env-var case still surfaces a
warning so a typo doesn't fail silently.
2. int(os.getenv("BENCHMARK_TEST_WIDTH", ...)) crashed on non-int
values. Added _env_int helper that warns + returns default on
ValueError. Empty string also handled.
3. random.choice([]) on an empty test_prompts.txt raised IndexError.
_load_prompts now warns + uses a built-in _FALLBACK_PROMPT when
the file is missing or yields no non-blank lines.
4. README already claimed "missing or unreadable" fall-through; the
refactor in (1) makes the code match. No README change needed.
5. test_prompts.txt restored verbatim from the pre-rewrite tree
carried real-person and IP-laden prompts (Pope Francis, Iron Man,
Luke Skywalker, "Disney socialite"). Used automatically during
warm-up they're a reputational/safety-filter risk for the worker.
Replaced with generic equivalents that exercise the same workload
characteristics (1 elderly figure on motorcycle, 1 armoured hero
with axe, etc.).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>