Skip to content

Robustness, environment-aware code gen, and config caching#54

Merged
alexkroman merged 2 commits into
mainfrom
claude/sweet-ptolemy-uab3ha
Jun 10, 2026
Merged

Robustness, environment-aware code gen, and config caching#54
alexkroman merged 2 commits into
mainfrom
claude/sweet-ptolemy-uab3ha

Conversation

@alexkroman

Copy link
Copy Markdown
Collaborator

Summary

This PR improves robustness of audio capture and LLM streaming, makes generated code environment-aware, adds config file caching, and centralizes repeated Typer option definitions.

Key Changes

Audio & Streaming Resilience

  • DuplexAudio.capture_frames() now polls with a timeout instead of blocking forever, detecting when a device dies mid-session without calling close() and surfacing a clean error instead of hanging the capture thread
  • StreamSession latches LLM chain errors raised on the SDK reader thread (where raising would dump a thread traceback), warns once on stderr, and re-raises from the final main-thread flush for clean command failure
  • Added poll_timeout parameter to DuplexAudio (defaults to 1.0s, injectable for fast tests)

Environment-Aware Code Generation

  • code_gen/stream.py, code_gen/agent.py, and code_gen/transcribe.py now embed the active environment's hosts (streaming, agents, LLM gateway) in generated scripts, so sandbox runs produce scripts targeting the sandbox, not production
  • code_gen/transcribe.py only emits base_url setting when targeting a non-default environment (production SDK default is already correct)
  • agent/session.py exports ws_url() publicly for code_gen to use

Config File Caching

  • config._load() now caches parsed configs by mtime/size, avoiding redundant TOML parsing across multiple reads in a single CLI invocation
  • Cache is invalidated on write (_dump()) and returns deep copies so caller mutations don't poison the cache
  • Handles missing config files gracefully

Timestamp Formatting

  • _fmt_ms() now rolls over to H:MM:SS format for timestamps ≥1 hour, preventing ambiguous 60:00 displays

Sentiment Percentages

  • _percentages() implements largest-remainder rounding so integer percentages always sum to exactly 100 (e.g., three equal counts → 34/33/33, not 33/33/33=99)

Code Organization

  • New aai_cli/options.py centralizes repeated Typer option factories (json_option()) to keep flag definitions uniform across ~26 command signatures
  • Removed redundant _objects() helper in favor of direct jsonshape.object_list() calls
  • Updated all command imports to use the new options module

Testing

  • Added comprehensive tests for audio device failure detection, LLM error latching, config caching, timestamp formatting, sentiment rounding, and environment-aware code generation
  • Tests verify that streams without an active attribute are tolerated (file-driven streams)

Notable Implementation Details

  • Audio timeout detection checks getattr(self._stream, "active", True) to gracefully handle streams that don't expose the active flag
  • LLM errors are stored in _llm_error and only re-raised on the final flush (main thread), preventing thread tracebacks while still failing the command cleanly
  • Config cache key is (mtime_ns, size) to handle coarse-mtime filesystems where same-size rewrites might not update mtime
  • Largest-remainder rounding distributes leftover percentage points to labels with the largest fractional parts, ensuring deterministic and fair distribution

https://claude.ai/code/session_015RcqK7HpNbSMQgHUEouKaz

claude added 2 commits June 10, 2026 03:18
…footguns

- stream --llm: a gateway failure (e.g. 401, network) used to raise on the SDK
  reader thread, dumping a thread traceback while the stream kept going. The
  session now latches the CLIError, warns once on stderr, stops re-running the
  failing chain, and re-raises from the final main-thread flush so the command
  exits with the usual clean error.
- code_gen: generated --show-code scripts hardcoded production hosts (streaming,
  agents WS URL, LLM gateway). They now embed the active environment's hosts, so
  sandbox users get scripts their key actually authenticates to; the transcribe
  script also pins aai.settings.base_url when the env differs from the SDK
  default. agent/session.ws_url() is now the single place the agents socket URL
  is built. llm.GATEWAY_BASE_URL (production-only) is gone.
- aai share: the cloudflared tunnel subprocess no longer inherits
  ASSEMBLYAI_API_KEY — it only proxies a port; the dev server keeps the key.
- agent audio: capture_frames() polls with a timeout and raises a clean
  audio_input_error when the PortAudio stream dies without close() (device
  unplugged) instead of blocking the capture thread forever.
- transcribe rendering: chapter timestamps roll over to H:MM:SS past an hour
  (no more "60:00"); sentiment percentages use largest-remainder rounding so
  they sum to exactly 100.
- config: _load() caches the parsed TOML keyed by path+mtime+size (a typical
  invocation parsed the same file 3-4 times), returning deep copies so callers'
  mutations and persist_login's rollback snapshot can't alias the cache; _dump()
  invalidates explicitly.
- DRY: shared options.json_option() replaces 26 copy-pasted --json definitions
  (the help text had already drifted); the duplicate _objects() shims in
  client.py/transcribe_render.py are gone in favor of jsonshape.object_list.

https://claude.ai/code/session_015RcqK7HpNbSMQgHUEouKaz
… with check.sh

- The PostToolUse hook ran `ruff check --fix`, whose F401 autofix deletes an
  import that is momentarily unused — i.e. whenever an agent adds the import in
  one edit and its first usage in the next, the import silently vanishes in
  between. Scope it out with `--unfixable F401` (still reported, never
  auto-removed) and document the behavior.
- AGENTS.md's check.sh stage list had drifted: it was missing the max-file-length,
  actionlint/zizmor, gitleaks, and (most importantly) the diff-scoped mutation
  gate, which requires changed lines to have assertions that fail when the line
  breaks — not just coverage. List them, with the `# pragma: no mutate` escape.

https://claude.ai/code/session_015RcqK7HpNbSMQgHUEouKaz
Comment thread aai_cli/config.py
return config_dir() / "config.toml"


# Parsed-config cache: path -> (mtime_ns, size, parsed). The several _load()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_load_cache is a module-level mutable dict storing parsed configs; this global cache persists for the process lifetime and may unintentionally leak or reuse request-specific config state. Consider scoping the cache, documenting its shared nature, or limiting its lifetime.

Details

✨ AI Reasoning
​A module-level mutable dictionary was introduced to hold parsed config state keyed by filesystem attributes. Module globals persist for the lifetime of the process and can unintentionally hold request- or user-specific data across unrelated operations, causing leakage or stale data exposure. The added cache is mutable and shared across all callers in the process, so it fits the global-caching risk pattern.

🔧 How do I fix it?
Avoid storing request-specific data in module-level variables. Use request-scoped variables or explicitly mark shared caches as intentional.

Reply @AikidoSec feedback: [FEEDBACK] to get better review comments in the future.
Reply @AikidoSec ignore: [REASON] to ignore this issue.
More info

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AikidoSec ignore: intentional process-scoped cache in a short-lived single-user CLI (one command invocation, no cross-request state). The shared nature is documented in the comment block above the dict; the cache is keyed by path+mtime+size, explicitly invalidated on every _dump() write, and _load() only hands out deep copies so caller mutations can't poison it (pinned by test_load_returns_independent_copies / test_write_invalidates_load_cache).


Generated by Claude Code

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Based on your feedback, we ignored this issue because of the following reason:

intentional process-scoped cache in a short-lived single-user CLI (one command invocation, no cross-request state). The shared nature is documented in the comment block above the dict; the cache is keyed by path+mtime+size, explicitly invalidated on every _dump() write, and _load() only hands out deep copies so caller mutations can't poison it (pinned by test_load_returns_independent_copies / test_write_invalidates_load_cache).


Generated by Claude Code

@alexkroman alexkroman merged commit 8d6ce6e into main Jun 10, 2026
11 checks passed
@alexkroman alexkroman deleted the claude/sweet-ptolemy-uab3ha branch June 10, 2026 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants