Add record/replay fixture harness for end-to-end CLI tests by alexkroman · Pull Request #60 · AssemblyAI/cli

alexkroman · 2026-06-10T16:21:29Z

What

Captures real AssemblyAI API responses once and replays CLI commands through them offline, so the transcribe / transcripts / llm / account command families are exercised end-to-end (command + parsing + rendering) without touching the network.

Harness

scripts/record_fixtures.py — a manual recorder (deliberately outside the gate; it hits the network). Drives the same client.* / llm.* / ams.* functions the CLI uses, scrubs every credential on the way out, and writes tests/fixtures/api/*.json. Refresh with:
```
ASSEMBLYAI_API_KEY=… uv run python scripts/record_fixtures.py
```
(The API key comes from the env; the AMS session JWT from the keyring of whoever ran aai login. Neither is ever written to a fixture.)
tests/replay_fixtures.py — rebuilds the boundary objects from recorded JSON: a real aai.Transcript via from_response, and an OpenAI ChatCompletion via model_construct (which mirrors the SDK's own lenient wire parsing — the gateway returns Anthropic-flavored fields like finish_reason: "end_turn" that strict validation rejects).
tests/test_replay_e2e.py — 7 replay tests, one per command family, fully offline (pytest-socket untouched).
tests/fixtures/api/ — 7 scrubbed snapshots: API key / JWT redacted, email → user@example.com, account_id → 12345, private cdn.assemblyai.com/upload/<hash> URLs redacted. gitleaks-clean.

Drive-by fixes (two pre-existing cross-platform gate failures)

Both pass on Linux CI but fail the dev gate on macOS:

share.py — mypy --warn-unreachable targets one platform at a time, so the if sys.platform == "darwin": return … made the other return provably dead (line 33 on macOS, the first return on Linux). Rewritten as a ternary expression so neither branch is a statement mypy can flag.
test_source_validation.py — a long tmp path let Rich wrap mid-word (py test), defeating the test's " ".join(split()) unwrap. Now compares with all whitespace removed, matching the existing pattern in test_share.py.

Testing

./scripts/check.sh → All checks passed (1391 tests, 100% patch coverage, mutation gate clean, build + twine OK).

🤖 Generated with Claude Code

Capture real AssemblyAI API responses once and replay CLI commands through them offline, so transcribe/transcripts/llm/account paths are exercised end-to-end (command + parsing + rendering) without touching the network. - scripts/record_fixtures.py: manual recorder (outside the gate) that drives the real client/llm/ams functions, scrubs every credential, and writes tests/fixtures/api/*.json. Refresh with `ASSEMBLYAI_API_KEY=… uv run python scripts/record_fixtures.py`. - tests/replay_fixtures.py: rebuilds a real aai.Transcript (from_response) and an OpenAI ChatCompletion (model_construct, matching the SDK's lenient wire parsing of the gateway's Anthropic-flavored fields) from recorded JSON. - tests/test_replay_e2e.py: 7 replay tests, one per command family, fully offline (pytest-socket untouched). - tests/fixtures/api/: 7 scrubbed snapshots (key/JWT redacted, email and account_id faked, private upload URLs redacted; gitleaks-clean). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Both pass on Linux CI but fail the dev gate on macOS: - share.py: `mypy --warn-unreachable` targets one platform at a time, so the if/return on `sys.platform == "darwin"` proved the other return dead (line 33 on macOS, the first return on Linux). Rewrite as a ternary expression so neither branch is a statement mypy can flag. Existing tests cover both branches. - test_source_validation: a long tmp path let Rich wrap mid-word ("py test"), defeating the `" ".join(split())` unwrap. Compare with all whitespace removed, matching the pattern already used in test_share.py. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

aikido-pr-checks · 2026-06-10T16:22:21Z

+    """
+    secret_set = {s for s in secrets if s}
+
+    def scrub(obj: Any) -> Any:


The nested function scrub performs unbounded recursion over input structures (dict/list) without a depth limit; add an explicit max-depth parameter or convert to an iterative traversal to prevent stack overflows on deeply nested inputs.

Details

✨ AI Reasoning
The PR adds a recursive scrubber function used to traverse arbitrary JSON-like objects. It directly calls itself for dict values and list items without any explicit depth limiting or loop-based alternative. Recursive traversal of untrusted or very deep structures can cause unbounded call depth and stack overflow. The change introduced this recursion rather than modifying pre-existing recursive code.

🔧 How do I fix it?
Add depth limiting via counter parameters that are checked and enforced, or replace with iterative approaches using explicit loops or stack data structures. For graphs, combine depth limiting with visited set tracking.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

`gitleaks dir` scans the working tree regardless of .gitignore, so high-entropy values in a developer's gitignored `.claude/settings.local.json` (a personal Claude Code file that never enters the repo) fail the local gate while CI — which lacks the file — passes. Cost a real diagnose/move-aside/restore detour this session. Allowlist that one path; the regex is anchored to it, so tracked `.claude/` files (settings.json, agents/, skills/) and everything else stay scanned. Verified: full-repo scan goes 12 findings -> 0 with the file present, and a secret at any other path is still caught. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Future agent sessions shouldn't have to rediscover where fixtures live, how to refresh them, or why the LLM response is rebuilt with model_construct rather than model_validate. Add a "Replay fixtures" subsection covering the three moving parts (recorder / fixtures / replay helper), the refresh command, that the recorder is outside the gate, and the Transcript/ChatCompletion reconstruction gotchas. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Replace the `assert sample_id is not None` type-narrowing guard with an explicit `if ... raise CLIError` — asserts are stripped under PYTHONOPTIMIZE, so the check must be a real statement. - Bound the scrubber's recursion at a max depth (API JSON is shallow; a deeper structure is malformed/hostile input) so it can't stack-overflow. The string redaction is extracted to a module-level `_scrub_str` helper to keep the function under the project's cyclomatic-complexity cap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexkroman-assembly and others added 2 commits June 10, 2026 09:20

aikido-pr-checks Bot reviewed Jun 10, 2026

View reviewed changes

alexkroman-assembly and others added 3 commits June 10, 2026 09:28

alexkroman merged commit b430cd9 into main Jun 10, 2026
11 checks passed

alexkroman deleted the add-replay-fixture-harness branch June 10, 2026 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add record/replay fixture harness for end-to-end CLI tests#60

Add record/replay fixture harness for end-to-end CLI tests#60
alexkroman merged 5 commits into
mainfrom
add-replay-fixture-harness

alexkroman commented Jun 10, 2026

Uh oh!

aikido-pr-checks Bot Jun 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

alexkroman commented Jun 10, 2026

What

Harness

Drive-by fixes (two pre-existing cross-platform gate failures)

Testing

Uh oh!

aikido-pr-checks Bot Jun 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants