Skip to content

Fix transcribe treating empty source as a directory scan#89

Closed
alexkroman wants to merge 2 commits into
mainfrom
fix-transcribe-empty-source
Closed

Fix transcribe treating empty source as a directory scan#89
alexkroman wants to merge 2 commits into
mainfrom
fix-transcribe-empty-source

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

Problem

Found during exploratory error testing of the CLI. Running assembly transcribe "" (empty-string source) silently entered batch mode and recursively walked the working directory, queuing every audio file it found anywhere underneath:

$ assembly transcribe ""
 Source                        Status   ...
 .venv/lib/python3.12/site-…   queued
 tests/e2e/fixtures/fox.wav    queued
 ...   (every audio file under cwd, recursively)

Root cause

In expand_sources (aai_cli/transcribe_batch.py), an empty string fell through the source is None guard. Path("") resolves to ., so path.is_dir() was True and the source was treated as "batch this directory" → rglob("*") over the entire tree. With a valid API key this would upload arbitrary local audio (a real cost/privacy footgun).

Fix

Treat an empty source like a missing one (not source) so it stays on the single-source path, where it already reads correctly as:

$ assembly transcribe ""
Error: Provide an audio path or URL.
Suggestion: Or pass --sample to use the hosted demo file.

A real directory argument (assembly transcribe .) still batches as before — only the empty string is redirected.

Tests

  • Extended test_non_batch_sources_return_none to cover "".
  • Added test_empty_source_argument_does_not_batch_the_working_directory, a CLI-level regression test that monkeypatches Path.rglob to fail loudly if an empty source ever reaches directory scanning again, and asserts exit code 2 + the correct message.

Full ./scripts/check.sh gate passes (incl. 100% patch coverage + mutation gate).

🤖 Generated with Claude Code

alexkroman-assembly and others added 2 commits June 11, 2026 17:40
`assembly transcribe ""` silently batch-walked the working directory: an
empty string fell through the `source is None` guard in expand_sources, and
since Path("") resolves to ".", it was treated as "batch this directory" and
recursively rglob'd every audio file under cwd. With a valid key that would
upload arbitrary local audio.

Treat an empty source like a missing one (`not source`) so it stays on the
single-source path, where it correctly reads as "Provide an audio path or URL."

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexkroman

Copy link
Copy Markdown
Collaborator Author

Superseded by #90, which independently shipped the identical fix (not source guard in expand_sources) plus equivalent regression tests, and merged first. Closing as redundant.

@alexkroman alexkroman closed this Jun 12, 2026
@alexkroman alexkroman deleted the fix-transcribe-empty-source branch June 12, 2026 01:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants