Add aai transcribe --download-sections (yt-dlp passthrough)#64
Conversation
Wire yt-dlp's `--download-sections` flag through to `aai transcribe` so a YouTube/podcast URL can fetch only part of the source (e.g. `*0:00-5:00` for the first five minutes) before transcribing. - youtube.download_audio gains a `download_sections` arg that sets yt-dlp's `download_ranges` + `force_keyframes_at_cuts` (exact cuts). A new parse_download_sections() mirrors yt-dlp's grammar verbatim: `*start-end` timestamp ranges (comma-separated, inf/open-ended/negative bounds), chapter regexes, and `*from-url`. Malformed specs raise a clean UsageError (exit 2). - transcribe command exposes the repeatable `--download-sections` option, threaded through run_transcription and reflected in `--show-code`. - code_gen renders the sections into the generated yt-dlp block (download_range_func, force_keyframes_at_cuts, conditional `import re`). - Moved the transcribe `validate_*` helpers into transcribe_exec.py to keep commands/transcribe.py under the 500-line gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
|
||
| def _section_range(text: str) -> tuple[float, float]: | ||
| """Parse one ``*``-stripped ``start-end`` range into (start, end) seconds.""" | ||
| match = _SECTION_RANGE_RE.fullmatch(text) if text != "-" else None |
There was a problem hiding this comment.
_section_range unconditionally rejects "-" via if text != "-" else None, even though omitted bounds are documented as valid. This makes that valid full-range form impossible to parse.
| match = _SECTION_RANGE_RE.fullmatch(text) if text != "-" else None | |
| match = _SECTION_RANGE_RE.fullmatch(text) |
Details
✨ AI Reasoning
The range parser is meant to support omitted start and end bounds. That means a bare separator should represent the default full range. However, the control flow explicitly bypasses regex parsing for that exact input and forces it into the error path. This creates a direct contradiction between the parser's stated behavior and what it can actually accept, causing valid input to be rejected every time.
Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info
Summary
Wire yt-dlp's
--download-sectionsflag through toaai transcribe, so a YouTube/podcast URL can fetch only part of the source before transcribing:It's a direct passthrough of yt-dlp's own flag (same syntax, repeatable) —
*0:00-5:00grabs just the first five minutes. yt-dlp downloads only that slice instead of the whole track.Changes
youtube.py—download_audio(..., download_sections=...)sets yt-dlp'sdownload_ranges+force_keyframes_at_cuts(exact cuts, not nearest keyframe). Newparse_download_sections()mirrors yt-dlp's grammar verbatim:*start-endtimestamp ranges (comma-separated,inf/open-ended/negative bounds), chapter-title regexes, and*from-url. Malformed specs raise a cleanUsageError(exit 2).commands/transcribe.py— repeatable--download-sectionsoption (Customization panel), threaded to the run path and--show-code. Moved the threevalidate_*helpers intotranscribe_exec.pyto keep the command under the 500-line gate.transcribe_exec.py—run_transcription(..., download_sections=...)forwards to the download.code_gen/transcribe.py—--show-codereflects the sections in the generated yt-dlp block (download_range_func(...),force_keyframes_at_cuts, conditionalimport refor chapter regexes).Notes
transcribe.aai streamsharesyoutube.download_audio, so the same flag there is a small follow-up if wanted.# pyright: ignorecomments cover yt-dlp's narrow/wrong inferred types (parse_duration -> float,ranges: tuple[int,int]) — matches the existing pattern in that file; no# type: ignore/# noqa/cast/Anyadded.Testing
./scripts/check.sh→ All checks passed (ruff, mypy, pyright, 100% patch coverage, mutation gate: 30 mutants none survived, generated-code compile gate with a new--download-sections --show-codefixture, regenerated help snapshot).🤖 Generated with Claude Code