Skip to content

Add aai transcribe --download-sections (yt-dlp passthrough)#64

Merged
alexkroman merged 1 commit into
mainfrom
transcribe-download-sections
Jun 11, 2026
Merged

Add aai transcribe --download-sections (yt-dlp passthrough)#64
alexkroman merged 1 commit into
mainfrom
transcribe-download-sections

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

Summary

Wire yt-dlp's --download-sections flag through to aai transcribe, so a YouTube/podcast URL can fetch only part of the source before transcribing:

aai transcribe "https://youtu.be/dtp6b76pMak" --download-sections "*0:00-5:00"

It's a direct passthrough of yt-dlp's own flag (same syntax, repeatable) — *0:00-5:00 grabs just the first five minutes. yt-dlp downloads only that slice instead of the whole track.

Changes

  • youtube.pydownload_audio(..., download_sections=...) sets yt-dlp's download_ranges + force_keyframes_at_cuts (exact cuts, not nearest keyframe). New parse_download_sections() mirrors yt-dlp's grammar verbatim: *start-end timestamp ranges (comma-separated, inf/open-ended/negative bounds), chapter-title regexes, and *from-url. Malformed specs raise a clean UsageError (exit 2).
  • commands/transcribe.py — repeatable --download-sections option (Customization panel), threaded to the run path and --show-code. Moved the three validate_* helpers into transcribe_exec.py to keep the command under the 500-line gate.
  • transcribe_exec.pyrun_transcription(..., download_sections=...) forwards to the download.
  • code_gen/transcribe.py--show-code reflects the sections in the generated yt-dlp block (download_range_func(...), force_keyframes_at_cuts, conditional import re for chapter regexes).

Notes

  • Scoped to transcribe. aai stream shares youtube.download_audio, so the same flag there is a small follow-up if wanted.
  • Two # pyright: ignore comments cover yt-dlp's narrow/wrong inferred types (parse_duration -> float, ranges: tuple[int,int]) — matches the existing pattern in that file; no # type: ignore/# noqa/cast/Any added.

Testing

  • ./scripts/check.shAll checks passed (ruff, mypy, pyright, 100% patch coverage, mutation gate: 30 mutants none survived, generated-code compile gate with a new --download-sections --show-code fixture, regenerated help snapshot).

🤖 Generated with Claude Code

Wire yt-dlp's `--download-sections` flag through to `aai transcribe` so a
YouTube/podcast URL can fetch only part of the source (e.g. `*0:00-5:00` for
the first five minutes) before transcribing.

- youtube.download_audio gains a `download_sections` arg that sets yt-dlp's
  `download_ranges` + `force_keyframes_at_cuts` (exact cuts). A new
  parse_download_sections() mirrors yt-dlp's grammar verbatim: `*start-end`
  timestamp ranges (comma-separated, inf/open-ended/negative bounds), chapter
  regexes, and `*from-url`. Malformed specs raise a clean UsageError (exit 2).
- transcribe command exposes the repeatable `--download-sections` option,
  threaded through run_transcription and reflected in `--show-code`.
- code_gen renders the sections into the generated yt-dlp block
  (download_range_func, force_keyframes_at_cuts, conditional `import re`).
- Moved the transcribe `validate_*` helpers into transcribe_exec.py to keep
  commands/transcribe.py under the 500-line gate.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread aai_cli/youtube.py

def _section_range(text: str) -> tuple[float, float]:
"""Parse one ``*``-stripped ``start-end`` range into (start, end) seconds."""
match = _SECTION_RANGE_RE.fullmatch(text) if text != "-" else None

@aikido-pr-checks aikido-pr-checks Bot Jun 11, 2026

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_section_range unconditionally rejects "-" via if text != "-" else None, even though omitted bounds are documented as valid. This makes that valid full-range form impossible to parse.

Suggested change
match = _SECTION_RANGE_RE.fullmatch(text) if text != "-" else None
match = _SECTION_RANGE_RE.fullmatch(text)
Details

✨ AI Reasoning
​The range parser is meant to support omitted start and end bounds. That means a bare separator should represent the default full range. However, the control flow explicitly bypasses regex parsing for that exact input and forces it into the error path. This creates a direct contradiction between the parser's stated behavior and what it can actually accept, causing valid input to be rejected every time.

Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info

@alexkroman alexkroman merged commit daa6907 into main Jun 11, 2026
11 checks passed
@alexkroman alexkroman deleted the transcribe-download-sections branch June 11, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants