test: kill surviving mutants in errors, timeparse, follow by alexkroman · Pull Request #31 · AssemblyAI/cli

alexkroman · 2026-06-07T01:12:05Z

Strengthen assertions so the mutation gate's structural mutants die:

errors: cover a structural HTTP 401 (not just 403) and pin CLIError defaults
timeparse: reject a truthy non-string so the guard's or/and can't be swapped
follow: capture Live's screen/auto_refresh kwargs and per-update refresh flag

https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

Strengthen assertions so the mutation gate's structural mutants die: - errors: cover a structural HTTP 401 (not just 403) and pin CLIError defaults - timeparse: reject a truthy non-string so the guard's or/and can't be swapped - follow: capture Live's screen/auto_refresh kwargs and per-update refresh flag https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

The streaming render tests drove real Rich, so the Live construction kwargs and per-update refresh/flush flags weren't asserted and survived mutation. Inject a fake Live to pin screen/auto_refresh/transient/redirect_* and the forced refresh, and a flush-recording stream to pin status-notice flushing. https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

- output: pin data_table pad_edge / detail_table padding, and assert emit_ndjson writes one flushed newline-terminated record - context: assert resolve_session raises when only one of session/account_id is present (pins the `or` guard) and that a non-rejection NotAuthenticated still auto-logs-in with an env key set (pins the `and` in _should_auto_login) - transcribe_render: assert exact sentiment percentages, mm:ss formatting across a 60s boundary, and most-relevant-first ordering for topics/content-safety https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

The transcribe --show-code tests parsed/ran the output but never asserted the 4-space indent of the rendered config kwargs, so the indent literal survived mutation. Assert the exact indented config block. https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

_survives wrote each mutated module via ast.unparse and ran the covering tests in a subprocess. Consecutive mutants unparse to files differing by a single token, so they're typically the same byte length and can be written within one mtime-second. CPython validates a cached .pyc by exact (mtime, size) match, so the subprocess could load the previous mutant's (or the original's) bytecode and execute unmutated code — reporting a false "survivor" and failing the gate spuriously (and flakily). Drop the module's cached .pyc after writing the source so the subprocess always recompiles the mutant under test. https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

- sources: pin _is_streamable_wav's full mono/16-bit/16k `and` chain, the file-not-found/ffmpeg-missing/empty-audio exit codes, that ffmpeg isn't terminated on a clean EOF, and the exit-code fallback when stderr is empty - microphone: pin RawInputStream channels/dtype, the ~100ms blocksize floor, the `rate > 0` boundary, and that resample treats audio as 16-bit mono PCM - client: pin the single-row validation probe limit, the verbatim-vs-fallback transcribe error message, and that a provided on_begin is wired to Begin https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

- config_builder: pin _derive_kind's Optional unwrap + bare-scalar/dict origin classification, and that KEY=VALUE / NAME:VALUE split only on the first separator (values may contain '=' / ':') - youtube: assert yt-dlp is driven quietly and actually downloads, and pin the no-file-produced exit code - ams: assert the error "detail" field is extracted cleanly rather than leaking the raw JSON body (the fallback happened to contain the same substring) https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

- agent/audio: pin the device-rate blocksize, exact pending() sample count, the callback's exact zero-fill remainder, the audio-open exit code, and that close() lets the stream reopen on a later start() - agent/session: assert the server error message wins over code/fallback, that a transcript without "interrupted" defaults to False, and that a player which failed to open is never closed - auth/loopback: assert the callback answers 200 and unknown paths answer 404 Remaining survivors here are equivalent/threading-internal (daemon flags, join/wait timeout values, the sub-10Hz blocksize floor) and aren't observable. https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

The human-table test asserted only the id/model columns, so blanking a present created_at / audio_duration_sec value (the `value or ""` -> `and`) survived. Assert those values appear in the rendered table. https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

- account: assert the one-missing-bound window label, the one-day-window collapse, and that the default usage range spans exactly the last 30 days - llm: assert `-o json` forces JSON output even for a non-agentic human - login: assert the authenticated/logged_out success flags in --json output - samples: assert the stream sample requests format_turns and that human-mode `samples list` renders its bullet list (pins the string concatenation) (samples mkdir parents=True is equivalent here — the dir is one level under cwd.) https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

- assert the "Listening…" notice latches to exactly one emission - assert begin/turn/termination are all forwarded to the renderer in non-follow mode (pins the follow-vs-None handler wiring) - assert a turn event with no end_of_turn flag is treated as non-final in --llm follow mode (pins the getattr default) - assert the renderer is closed even when streaming raises mid-run Remaining survivors (worker-thread daemon flag, 0.1s join poll interval) are equivalent/threading-internal and not observable. https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

The new 200/404 status assertions need the response code. Use http.client instead of urllib.request.urlopen so a 404 is a normal response status (not a raised HTTPError), the status types as int (no mypy no-any-return), and no urllib audit (S310) suppression — hence no new noqa escape hatch — is needed. https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

The blocksize floor (max(1, rate//10)) and the rate>0 guard only differ for sub-10Hz / 1Hz rates no real device reports — near-equivalent mutants, like the daemon/timeout ones left elsewhere. Their fakes needed a `fake_sd: Any` module (pyright can't assign attributes to a bare ModuleType), which tripped the "no net-new Any" gate. The valuable resample-params and channels/dtype assertions remain and add no Any. https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

DEFAULT_ENV is already sandbox000, so the existing env tests couldn't tell the `sandbox and env is None` override apart from the default. Bind a profile to production and assert that a bare invocation keeps production while --sandbox forces sandbox000 — killing the and/or and is/is-not mutants on that line. (The two err=True echoes route to stderr; CliRunner mixes streams in this version, so that routing isn't separately assertable here.) https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

- init: assert a resolved key omits the skipped-key row, and that the launch-skipped row (with the manual uvicorn command) appears only when deps were installed AND no key is present — pinning both guards - setup: a direct _proc_detail test pinning stderr-then-stdout preference Remaining survivors in these modules (subprocess flag kwargs, timeout/exit-code constants, the no-TTY picker guard) are low-value infra/near-equivalent. https://claude.ai/code/session_014MYaBvEeWEJXJt36CCSHtC

claude added 15 commits June 6, 2026 22:52

alexkroman merged commit 35d497c into main Jun 7, 2026
10 checks passed

alexkroman deleted the claude/mutant-testing-improvements-mGkhF branch June 7, 2026 01:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: kill surviving mutants in errors, timeparse, follow#31

test: kill surviving mutants in errors, timeparse, follow#31
alexkroman merged 15 commits into
mainfrom
claude/mutant-testing-improvements-mGkhF

alexkroman commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexkroman commented Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants