Delegate context-window management to deepagents SummarizationMiddleware#264
Merged
Conversation
…zation The live cascade brain is a deepagents graph, and create_deep_agent already wires its own SummarizationMiddleware into the stack (summarize old turns, offload the evicted history to a file) — real context-window management. The engine's client-side sliding window (text.trim_history + config.max_history) was redundant in front of it, so this removes it: the engine now feeds the full untrimmed running history each turn and lets the graph compact it. max_history stays on CascadeConfig but only drives the hand-rolled --show-code / `assembly init` cascade, which talks to the gateway directly and has no middleware. text.trim_history is gone; split_sentences' inline boundary predicate is extracted into _ends_sentence (mirroring _is_boundary) to keep the module's average complexity at rank A after the helper's removal. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01RgG91Q7U3j2pbJvyfTJa3X
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Removes client-side conversation history trimming from the cascade engine and delegates context-window management to the deepagents brain's built-in
SummarizationMiddleware. The engine now keeps the full running conversation history and lets the graph handle compaction (summarizing old turns, offloading evicted history to a file).Key Changes
trim_history()function fromaai_cli/agent_cascade/text.py— this utility is no longer needed since the brain handles windowing server-sideaai_cli/agent_cascade/engine.py:trim_history()call inon_turn()after appending user messagestrim_history()call in_record_spoken()after appending assistant messagestext.py: Clarified that conversation-history trimming moved to the brain'sSummarizationMiddlewareengine.py: Added docstring to_record_spoken()explaining the engine now keeps full historybrain.py: Added note that context-window management is delegated to deepagentsconfig.py: Clarified thatDEFAULT_MAX_HISTORYonly applies to standalone (--show-code/inittemplate) paths, not the live braintrim_history()unit tests fromtests/test_agent_cascade_text.pytest_generate_reply_trims_history_window()→test_generate_reply_keeps_full_untrimmed_history()to verify the engine no longer trimstest_on_turn_trims_history_window()→test_on_turn_keeps_full_untrimmed_history()to verify full history is retainedsplit_sentences()to use a new_ends_sentence()helper function for clarity (distinguishes end-of-text boundaries from mid-stream boundaries used in streaming)Implementation Details
The
_ends_sentence()helper clarifies the distinction between sentence boundaries in complete text (where end-of-text is a real boundary) versus partial streamed chunks (where it might be mid-token). This improves code readability without changing behavior.The
max_historyconfig parameter remains for backward compatibility with standalone code paths (templates,--show-codegenerator) but is now inert in the live brain, which relies on deepagents'SummarizationMiddlewarefor context management.https://claude.ai/code/session_01RgG91Q7U3j2pbJvyfTJa3X