perf: skip upstream blocks with cached outputs when running a single …#285
perf: skip upstream blocks with cached outputs when running a single …#285jamesbhobbs wants to merge 3 commits intomainfrom
Conversation
…block When running with --block, upstream dependencies with valid cached outputs (matching contentHash in the latest snapshot) are now skipped instead of being re-executed every time. Adds --force-upstream-blocks flag to bypass the cache and force re-execution of all upstream blocks. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR introduces upstream dependency caching functionality to the CLI. A new Sequence Diagram(s)sequenceDiagram
actor User
participant CLI
participant RunCmd as Run Command
participant SnapshotLoader as Snapshot Loader
participant FilterLogic as Cache Filter
participant Executor as Block Executor
User->>CLI: run --force-upstream-blocks
CLI->>RunCmd: execute with forceUpstreamBlocks=true
alt force-upstream-blocks set
RunCmd->>Executor: execute all upstream + target blocks
else skip cache filtering
RunCmd->>SnapshotLoader: loadLatestSnapshot(sourcePath, projectId)
SnapshotLoader-->>RunCmd: snapshot
RunCmd->>FilterLogic: filterCachedUpstreamBlocks(upstreamIds, snapshot)
alt cached blocks found
FilterLogic-->>RunCmd: filtered upstream IDs (or undefined)
RunCmd->>Executor: execute filtered upstream + target
else no cached blocks or snapshot missing
FilterLogic-->>RunCmd: undefined or all upstream IDs
RunCmd->>Executor: execute all upstream + target
end
end
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #285 +/- ##
==========================================
+ Coverage 82.34% 82.40% +0.06%
==========================================
Files 110 110
Lines 6785 6822 +37
Branches 1876 1884 +8
==========================================
+ Hits 5587 5622 +35
- Misses 1198 1200 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@packages/cli/src/commands/run.ts`:
- Around line 428-437: The code unsafely asserts runtime presence of outputs
when building snapshotBlocks; replace the ad-hoc cast with a proper type guard
function (e.g., isExecutableBlock) and use it when inspecting block.outputs so
the check is type-safe; update the loop that builds snapshotBlocks (referencing
snapshot.project.notebooks, snapshotBlocks, block, and execBlock) to call the
guard and then set hasOutputs based on block.outputs only when the guard returns
true.
| const snapshotBlocks = new Map<string, { contentHash?: string; hasOutputs: boolean }>() | ||
| for (const notebook of snapshot.project.notebooks) { | ||
| for (const block of notebook.blocks) { | ||
| const execBlock = block as typeof block & { outputs?: unknown[] } | ||
| snapshotBlocks.set(block.id, { | ||
| contentHash: block.contentHash, | ||
| hasOutputs: (execBlock.outputs?.length ?? 0) > 0, | ||
| }) | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Type assertion for outputs access.
The cast at line 431 assumes runtime blocks may have outputs. This is reasonable given the polymorphic block structure, but consider extracting a proper type guard if this pattern recurs.
🤖 Prompt for AI Agents
In `@packages/cli/src/commands/run.ts` around lines 428 - 437, The code unsafely
asserts runtime presence of outputs when building snapshotBlocks; replace the
ad-hoc cast with a proper type guard function (e.g., isExecutableBlock) and use
it when inspecting block.outputs so the check is type-safe; update the loop that
builds snapshotBlocks (referencing snapshot.project.notebooks, snapshotBlocks,
block, and execBlock) to call the guard and then set hasOutputs based on
block.outputs only when the guard returns true.
dinohamzic
left a comment
There was a problem hiding this comment.
As discussed via DM, this approach breaks running blocks with upstream dependencies as we still don't support stateful runs.
The cached upstream blocks are being skipped entirely, they don't execute at all. But those blocks define Python variables (like import numpy as np) that downstream blocks depend on. The snapshot only stores cell outputs (printed text, images), not the kernel state (variables in memory). So when we skip an upstream block, its variables never exist in the kernel, and the downstream block fails with NameError.
…block
When running with --block, upstream dependencies with valid cached outputs (matching contentHash in the latest snapshot) are now skipped instead of being re-executed every time. Adds --force-upstream-blocks flag to bypass the cache and force re-execution of all upstream blocks.
Performance improvement of #272
Summary by CodeRabbit
Release Notes
--force-upstream-blocksCLI option for the run command, enabling forced re-execution of all upstream blocks and bypassing the output cache