Fix agent-cascade/speak streaming TTS and thread STT/LLM/TTS options by alexkroman · Pull Request #176 · AssemblyAI/cli

alexkroman · 2026-06-16T13:44:06Z

Summary

assembly agent-framework (and assembly speak) could not produce audio. This fixes three independent streaming-TTS client bugs — each was masking the next — and then threads full STT/LLM/TTS customization into agent-framework.

Bug fixes (`aai_cli/tts/session.py`, `aai_cli/streaming/diagnostics.py`)

Bug	Symptom	Fix
Wrong auth scheme (`Bearer <key>` vs raw key)	"did not start the session (got 'Error')"	`open_authorized_ws` gained `bearer=` (default True for Voice Agent); TTS passes `bearer=False`
Wrong flush tag (`ForceFlushTextBuffer`)	"TTS error (3006): … 'Flush'"	send `{"type": "Flush"}`
Missing end-of-stream marker (waited for `Audio.is_final`, server sends `FlushDone`)	audio arrived but loop hung 60s then "stopped responding"	break the collect loop on `FlushDone`

Supporting: the session-start error path now surfaces the server's error_code/error instead of a generic got 'Error' (this is how the flush validation error became visible).

The working agent_framework init template was the reference for all three — it authenticates with the raw key and sends Flush.

Feature: per-leg customization on `agent-framework`

Hybrid surface (common named flags + per-leg KEY=VALUE escape hatches), grouped into --help panels:

Speech-to-text — --speech-model, --format-turns/--no-format-turns, --turn-detection, --stt-config, --stt-config-file
Language model — --max-tokens, --llm-config
Text-to-speech — --language, --tts-config

Details: the reply trigger is now format-aware so --no-format-turns still replies; the TTS sample rate stays locked to the live speaker; --tts-config rejects reserved keys (voice/language/sample_rate). Precedence matches stream (named flag/preset wins a head-to-head with --stt-config).

Test plan

Full gate green (./scripts/check.sh): 2738 tests, 100% patch coverage, diff-scoped mutation gate, build/twine.
Verified live against the sandbox: assembly --sandbox speak "hello there" writes a valid 0.96s / 24kHz WAV; a fully-customized file-driven agent-framework run drives STT→LLM→TTS with no error.

🤖 Generated with Claude Code

The terminal cascade (assembly agent-cascade) and assembly speak could not produce audio. Three independent bugs in the streaming-TTS client, each masking the next, plus a diagnostics gap: - Auth scheme: the TTS socket was opened with 'Authorization: Bearer <key>', but AssemblyAI streaming endpoints authenticate with the raw key (only Voice Agent uses Bearer). open_authorized_ws gained a bearer= flag (default True); TTS now passes bearer=False, matching the working agent-cascade init template. - Flush tag: sent 'ForceFlushTextBuffer'; the server's tag is 'Flush'. - End-of-stream: the loop waited for an Audio frame with is_final, which the live server never sets — it ends a synthesis with a 'FlushDone' frame. Without handling it the loop blocked until the 60s recv timeout and the audio was lost. - The session-start error path discarded the server's Error-frame contents (generic "got 'Error'"); it now surfaces error_code/error, which is how the flush validation error became visible. Also thread per-leg customization into agent-cascade (hybrid: common named flags + per-leg KEY=VALUE escape hatches), grouped into --help panels: - STT: --speech-model, --format-turns/--no-format-turns, --turn-detection, --stt-config, --stt-config-file - LLM: --max-tokens, --llm-config - TTS: --language, --tts-config (new SpeakConfig.extra) The reply trigger is now format-aware so --no-format-turns still replies; TTS sample rate stays locked to the live player; --tts-config rejects reserved keys. Verified against the live sandbox: assembly speak produces a valid 24kHz WAV and a fully-customized file-driven cascade runs STT->LLM->TTS without error. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexkroman · 2026-06-16T14:17:18Z

Rebased onto main after #175 renamed agent-framework → agent-cascade. Ported all changes onto the new module/command/symbol names (agent_cascade, AgentCascadeOptions, run_agent_cascade, command agent-cascade); the TTS bug-fix files (tts/session.py, streaming/diagnostics.py) were unaffected by the rename. Full gate re-run green against the new base.

alexkroman force-pushed the agent-framework-tts-and-options branch from f743ff2 to 00128ac Compare June 16, 2026 14:00

alexkroman changed the title ~~Fix agent-framework/speak streaming TTS and thread STT/LLM/TTS options~~ Fix agent-cascade/speak streaming TTS and thread STT/LLM/TTS options Jun 16, 2026

alexkroman added this pull request to the merge queue Jun 16, 2026

Merged via the queue into main with commit 53b3141 Jun 16, 2026
19 checks passed

alexkroman deleted the agent-framework-tts-and-options branch June 16, 2026 14:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix agent-cascade/speak streaming TTS and thread STT/LLM/TTS options#176

Fix agent-cascade/speak streaming TTS and thread STT/LLM/TTS options#176
alexkroman merged 1 commit into
mainfrom
agent-framework-tts-and-options

alexkroman commented Jun 16, 2026

Uh oh!

alexkroman commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexkroman commented Jun 16, 2026

Summary

Bug fixes (aai_cli/tts/session.py, aai_cli/streaming/diagnostics.py)

Feature: per-leg customization on agent-framework

Test plan

Uh oh!

alexkroman commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bug fixes (`aai_cli/tts/session.py`, `aai_cli/streaming/diagnostics.py`)

Feature: per-leg customization on `agent-framework`