Skip to content

Add record/replay fixture harness for end-to-end CLI tests#60

Merged
alexkroman merged 5 commits into
mainfrom
add-replay-fixture-harness
Jun 10, 2026
Merged

Add record/replay fixture harness for end-to-end CLI tests#60
alexkroman merged 5 commits into
mainfrom
add-replay-fixture-harness

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

What

Captures real AssemblyAI API responses once and replays CLI commands through them offline, so the transcribe / transcripts / llm / account command families are exercised end-to-end (command + parsing + rendering) without touching the network.

Harness

  • scripts/record_fixtures.py — a manual recorder (deliberately outside the gate; it hits the network). Drives the same client.* / llm.* / ams.* functions the CLI uses, scrubs every credential on the way out, and writes tests/fixtures/api/*.json. Refresh with:
    ASSEMBLYAI_API_KEY=… uv run python scripts/record_fixtures.py
    (The API key comes from the env; the AMS session JWT from the keyring of whoever ran aai login. Neither is ever written to a fixture.)
  • tests/replay_fixtures.py — rebuilds the boundary objects from recorded JSON: a real aai.Transcript via from_response, and an OpenAI ChatCompletion via model_construct (which mirrors the SDK's own lenient wire parsing — the gateway returns Anthropic-flavored fields like finish_reason: "end_turn" that strict validation rejects).
  • tests/test_replay_e2e.py — 7 replay tests, one per command family, fully offline (pytest-socket untouched).
  • tests/fixtures/api/ — 7 scrubbed snapshots: API key / JWT redacted, emailuser@example.com, account_id12345, private cdn.assemblyai.com/upload/<hash> URLs redacted. gitleaks-clean.

Drive-by fixes (two pre-existing cross-platform gate failures)

Both pass on Linux CI but fail the dev gate on macOS:

  • share.pymypy --warn-unreachable targets one platform at a time, so the if sys.platform == "darwin": return … made the other return provably dead (line 33 on macOS, the first return on Linux). Rewritten as a ternary expression so neither branch is a statement mypy can flag.
  • test_source_validation.py — a long tmp path let Rich wrap mid-word (py test), defeating the test's " ".join(split()) unwrap. Now compares with all whitespace removed, matching the existing pattern in test_share.py.

Testing

./scripts/check.shAll checks passed (1391 tests, 100% patch coverage, mutation gate clean, build + twine OK).

🤖 Generated with Claude Code

alexkroman-assembly and others added 2 commits June 10, 2026 09:20
Capture real AssemblyAI API responses once and replay CLI commands through
them offline, so transcribe/transcripts/llm/account paths are exercised
end-to-end (command + parsing + rendering) without touching the network.

- scripts/record_fixtures.py: manual recorder (outside the gate) that drives
  the real client/llm/ams functions, scrubs every credential, and writes
  tests/fixtures/api/*.json. Refresh with
  `ASSEMBLYAI_API_KEY=… uv run python scripts/record_fixtures.py`.
- tests/replay_fixtures.py: rebuilds a real aai.Transcript (from_response)
  and an OpenAI ChatCompletion (model_construct, matching the SDK's lenient
  wire parsing of the gateway's Anthropic-flavored fields) from recorded JSON.
- tests/test_replay_e2e.py: 7 replay tests, one per command family, fully
  offline (pytest-socket untouched).
- tests/fixtures/api/: 7 scrubbed snapshots (key/JWT redacted, email and
  account_id faked, private upload URLs redacted; gitleaks-clean).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Both pass on Linux CI but fail the dev gate on macOS:

- share.py: `mypy --warn-unreachable` targets one platform at a time, so the
  if/return on `sys.platform == "darwin"` proved the other return dead (line 33
  on macOS, the first return on Linux). Rewrite as a ternary expression so
  neither branch is a statement mypy can flag. Existing tests cover both
  branches.
- test_source_validation: a long tmp path let Rich wrap mid-word ("py test"),
  defeating the `" ".join(split())` unwrap. Compare with all whitespace
  removed, matching the pattern already used in test_share.py.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread scripts/record_fixtures.py Outdated
"""
secret_set = {s for s in secrets if s}

def scrub(obj: Any) -> Any:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nested function scrub performs unbounded recursion over input structures (dict/list) without a depth limit; add an explicit max-depth parameter or convert to an iterative traversal to prevent stack overflows on deeply nested inputs.

Details

✨ AI Reasoning
​The PR adds a recursive scrubber function used to traverse arbitrary JSON-like objects. It directly calls itself for dict values and list items without any explicit depth limiting or loop-based alternative. Recursive traversal of untrusted or very deep structures can cause unbounded call depth and stack overflow. The change introduced this recursion rather than modifying pre-existing recursive code.

🔧 How do I fix it?
Add depth limiting via counter parameters that are checked and enforced, or replace with iterative approaches using explicit loops or stack data structures. For graphs, combine depth limiting with visited set tracking.

Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info

alexkroman-assembly and others added 3 commits June 10, 2026 09:28
`gitleaks dir` scans the working tree regardless of .gitignore, so high-entropy
values in a developer's gitignored `.claude/settings.local.json` (a personal
Claude Code file that never enters the repo) fail the local gate while CI — which
lacks the file — passes. Cost a real diagnose/move-aside/restore detour this
session. Allowlist that one path; the regex is anchored to it, so tracked
`.claude/` files (settings.json, agents/, skills/) and everything else stay
scanned. Verified: full-repo scan goes 12 findings -> 0 with the file present,
and a secret at any other path is still caught.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Future agent sessions shouldn't have to rediscover where fixtures live, how to
refresh them, or why the LLM response is rebuilt with model_construct rather than
model_validate. Add a "Replay fixtures" subsection covering the three moving
parts (recorder / fixtures / replay helper), the refresh command, that the
recorder is outside the gate, and the Transcript/ChatCompletion reconstruction
gotchas.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Replace the `assert sample_id is not None` type-narrowing guard with an explicit
  `if ... raise CLIError` — asserts are stripped under PYTHONOPTIMIZE, so the check
  must be a real statement.
- Bound the scrubber's recursion at a max depth (API JSON is shallow; a deeper
  structure is malformed/hostile input) so it can't stack-overflow. The string
  redaction is extracted to a module-level `_scrub_str` helper to keep the function
  under the project's cyclomatic-complexity cap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexkroman alexkroman merged commit b430cd9 into main Jun 10, 2026
11 checks passed
@alexkroman alexkroman deleted the add-replay-fixture-harness branch June 10, 2026 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants