Add `aai transcribe --download-sections` (yt-dlp passthrough) by alexkroman · Pull Request #64 · AssemblyAI/cli

alexkroman · 2026-06-11T16:23:42Z

Summary

Wire yt-dlp's --download-sections flag through to aai transcribe, so a YouTube/podcast URL can fetch only part of the source before transcribing:

aai transcribe "https://youtu.be/dtp6b76pMak" --download-sections "*0:00-5:00"

It's a direct passthrough of yt-dlp's own flag (same syntax, repeatable) — *0:00-5:00 grabs just the first five minutes. yt-dlp downloads only that slice instead of the whole track.

Changes

youtube.py — download_audio(..., download_sections=...) sets yt-dlp's download_ranges + force_keyframes_at_cuts (exact cuts, not nearest keyframe). New parse_download_sections() mirrors yt-dlp's grammar verbatim: *start-end timestamp ranges (comma-separated, inf/open-ended/negative bounds), chapter-title regexes, and *from-url. Malformed specs raise a clean UsageError (exit 2).
commands/transcribe.py — repeatable --download-sections option (Customization panel), threaded to the run path and --show-code. Moved the three validate_* helpers into transcribe_exec.py to keep the command under the 500-line gate.
transcribe_exec.py — run_transcription(..., download_sections=...) forwards to the download.
code_gen/transcribe.py — --show-code reflects the sections in the generated yt-dlp block (download_range_func(...), force_keyframes_at_cuts, conditional import re for chapter regexes).

Notes

Scoped to transcribe. aai stream shares youtube.download_audio, so the same flag there is a small follow-up if wanted.
Two # pyright: ignore comments cover yt-dlp's narrow/wrong inferred types (parse_duration -> float, ranges: tuple[int,int]) — matches the existing pattern in that file; no # type: ignore/# noqa/cast/Any added.

Testing

./scripts/check.sh → All checks passed (ruff, mypy, pyright, 100% patch coverage, mutation gate: 30 mutants none survived, generated-code compile gate with a new --download-sections --show-code fixture, regenerated help snapshot).

🤖 Generated with Claude Code

Wire yt-dlp's `--download-sections` flag through to `aai transcribe` so a YouTube/podcast URL can fetch only part of the source (e.g. `*0:00-5:00` for the first five minutes) before transcribing. - youtube.download_audio gains a `download_sections` arg that sets yt-dlp's `download_ranges` + `force_keyframes_at_cuts` (exact cuts). A new parse_download_sections() mirrors yt-dlp's grammar verbatim: `*start-end` timestamp ranges (comma-separated, inf/open-ended/negative bounds), chapter regexes, and `*from-url`. Malformed specs raise a clean UsageError (exit 2). - transcribe command exposes the repeatable `--download-sections` option, threaded through run_transcription and reflected in `--show-code`. - code_gen renders the sections into the generated yt-dlp block (download_range_func, force_keyframes_at_cuts, conditional `import re`). - Moved the transcribe `validate_*` helpers into transcribe_exec.py to keep commands/transcribe.py under the 500-line gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

aikido-pr-checks · 2026-06-11T16:24:11Z

+
+def _section_range(text: str) -> tuple[float, float]:
+    """Parse one ``*``-stripped ``start-end`` range into (start, end) seconds."""
+    match = _SECTION_RANGE_RE.fullmatch(text) if text != "-" else None


_section_range unconditionally rejects "-" via if text != "-" else None, even though omitted bounds are documented as valid. This makes that valid full-range form impossible to parse.

Suggested change

match = _SECTION_RANGE_RE.fullmatch(text) if text != "-" else None

match = _SECTION_RANGE_RE.fullmatch(text)

Details

✨ AI Reasoning
The range parser is meant to support omitted start and end bounds. That means a bare separator should represent the default full range. However, the control flow explicitly bypasses regex parsing for that exact input and forces it into the error path. This creates a direct contradiction between the parser's stated behavior and what it can actually accept, causing valid input to be rejected every time.

_{Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.}
_{Reply @AikidoSec ignore: [REASON] to ignore this issue.}
_{More info}

aikido-pr-checks Bot reviewed Jun 11, 2026

View reviewed changes

alexkroman merged commit daa6907 into main Jun 11, 2026
11 checks passed

alexkroman deleted the transcribe-download-sections branch June 11, 2026 16:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `aai transcribe --download-sections` (yt-dlp passthrough)#64

Add `aai transcribe --download-sections` (yt-dlp passthrough)#64
alexkroman merged 1 commit into
mainfrom
transcribe-download-sections

alexkroman commented Jun 11, 2026

Uh oh!

aikido-pr-checks Bot Jun 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	match = _SECTION_RANGE_RE.fullmatch(text) if text != "-" else None
	match = _SECTION_RANGE_RE.fullmatch(text)

Conversation

alexkroman commented Jun 11, 2026

Summary

Changes

Notes

Testing

Uh oh!

aikido-pr-checks Bot Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

aikido-pr-checks Bot Jun 11, 2026 •

edited

Loading