Skip to content

perf: skip upstream blocks with cached outputs when running a single …#285

Open
jamesbhobbs wants to merge 3 commits intomainfrom
perf/skip-cached-upstream-blocks
Open

perf: skip upstream blocks with cached outputs when running a single …#285
jamesbhobbs wants to merge 3 commits intomainfrom
perf/skip-cached-upstream-blocks

Conversation

@jamesbhobbs
Copy link
Contributor

@jamesbhobbs jamesbhobbs commented Feb 12, 2026

…block

When running with --block, upstream dependencies with valid cached outputs (matching contentHash in the latest snapshot) are now skipped instead of being re-executed every time. Adds --force-upstream-blocks flag to bypass the cache and force re-execution of all upstream blocks.

Performance improvement of #272

Summary by CodeRabbit

Release Notes

  • New Features
    • Introduced --force-upstream-blocks CLI option for the run command, enabling forced re-execution of all upstream blocks and bypassing the output cache
    • Upstream blocks now intelligently skip re-execution when cached outputs match the current content, unless the force flag is used

…block

When running with --block, upstream dependencies with valid cached outputs
(matching contentHash in the latest snapshot) are now skipped instead of
being re-executed every time. Adds --force-upstream-blocks flag to bypass
the cache and force re-execution of all upstream blocks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 12, 2026

📝 Walkthrough

Walkthrough

This PR introduces upstream dependency caching functionality to the CLI. A new --force-upstream-blocks flag allows users to bypass the output cache and re-execute all upstream blocks. The implementation adds snapshot loading and cache filtering logic to the run command, enabling selective execution of upstream blocks based on content hash and cached output presence. Tests cover scenarios including cache hits, cache misses, cache invalidation, and error handling.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant CLI
    participant RunCmd as Run Command
    participant SnapshotLoader as Snapshot Loader
    participant FilterLogic as Cache Filter
    participant Executor as Block Executor

    User->>CLI: run --force-upstream-blocks
    CLI->>RunCmd: execute with forceUpstreamBlocks=true
    
    alt force-upstream-blocks set
        RunCmd->>Executor: execute all upstream + target blocks
    else skip cache filtering
        RunCmd->>SnapshotLoader: loadLatestSnapshot(sourcePath, projectId)
        SnapshotLoader-->>RunCmd: snapshot
        RunCmd->>FilterLogic: filterCachedUpstreamBlocks(upstreamIds, snapshot)
        
        alt cached blocks found
            FilterLogic-->>RunCmd: filtered upstream IDs (or undefined)
            RunCmd->>Executor: execute filtered upstream + target
        else no cached blocks or snapshot missing
            FilterLogic-->>RunCmd: undefined or all upstream IDs
            RunCmd->>Executor: execute all upstream + target
        end
    end
Loading
🚥 Pre-merge checks | ✅ 2 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 37.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Updates Docs ⚠️ Warning The pull request implements the --force-upstream-blocks CLI option but omits it from the documentation in packages/cli/README.md. Add --force-upstream-blocks to the run command options table in packages/cli/README.md with description and default value false.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'perf: skip upstream blocks with cached outputs when running a single …' directly and clearly summarizes the main change: caching optimization for upstream blocks during single-block execution.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov
Copy link

codecov bot commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 95.00000% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.40%. Comparing base (2059f7c) to head (9f75994).

Files with missing lines Patch % Lines
packages/cli/src/commands/run.ts 95.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #285      +/-   ##
==========================================
+ Coverage   82.34%   82.40%   +0.06%     
==========================================
  Files         110      110              
  Lines        6785     6822      +37     
  Branches     1876     1884       +8     
==========================================
+ Hits         5587     5622      +35     
- Misses       1198     1200       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@packages/cli/src/commands/run.ts`:
- Around line 428-437: The code unsafely asserts runtime presence of outputs
when building snapshotBlocks; replace the ad-hoc cast with a proper type guard
function (e.g., isExecutableBlock) and use it when inspecting block.outputs so
the check is type-safe; update the loop that builds snapshotBlocks (referencing
snapshot.project.notebooks, snapshotBlocks, block, and execBlock) to call the
guard and then set hasOutputs based on block.outputs only when the guard returns
true.

Comment on lines +428 to +437
const snapshotBlocks = new Map<string, { contentHash?: string; hasOutputs: boolean }>()
for (const notebook of snapshot.project.notebooks) {
for (const block of notebook.blocks) {
const execBlock = block as typeof block & { outputs?: unknown[] }
snapshotBlocks.set(block.id, {
contentHash: block.contentHash,
hasOutputs: (execBlock.outputs?.length ?? 0) > 0,
})
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Type assertion for outputs access.

The cast at line 431 assumes runtime blocks may have outputs. This is reasonable given the polymorphic block structure, but consider extracting a proper type guard if this pattern recurs.

🤖 Prompt for AI Agents
In `@packages/cli/src/commands/run.ts` around lines 428 - 437, The code unsafely
asserts runtime presence of outputs when building snapshotBlocks; replace the
ad-hoc cast with a proper type guard function (e.g., isExecutableBlock) and use
it when inspecting block.outputs so the check is type-safe; update the loop that
builds snapshotBlocks (referencing snapshot.project.notebooks, snapshotBlocks,
block, and execBlock) to call the guard and then set hasOutputs based on
block.outputs only when the guard returns true.

@jamesbhobbs jamesbhobbs marked this pull request as ready for review February 18, 2026 12:13
@jamesbhobbs jamesbhobbs requested a review from a team as a code owner February 18, 2026 12:13
Copy link
Contributor

@dinohamzic dinohamzic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed via DM, this approach breaks running blocks with upstream dependencies as we still don't support stateful runs.

The cached upstream blocks are being skipped entirely, they don't execute at all. But those blocks define Python variables (like import numpy as np) that downstream blocks depend on. The snapshot only stores cell outputs (printed text, images), not the kernel state (variables in memory). So when we skip an upstream block, its variables never exist in the kernel, and the downstream block fails with NameError.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments