[dbsp] Decouple foreground and background workers. by ryzhyk · Pull Request #5840 · feldera/feldera

ryzhyk · 2026-03-17T00:42:51Z

To date, we assigned a single background merger thread to each
foreground worker. This may not be sufficient for some workloads
or some phases workloads, leading to merge backpressure and slow
key lookups due to a large number of unmerged batches.

This commit decouples foreground and background worker counts,
making it possible to spread merging across any number of background
worker threads (we set the bg/fg worker ratio to 2 by default).

Design:

We associate a separate tokio runtime dedicated to merging with
a DBSP runtime.
For every spine created by a foreground thread, we start a
separate tokio task per spine level. The task runs in an infinite
loop making progress with merges at its level. As
before we use fuel to control the amount of CPU time at each level.
The task yields the CPU after using up its fuel (at this point tokio
should put it at the end of the scheduling queue and schedule other
tasks). The task blocks when there is no outstanding merge at
its level.
As a side effect of the new design, we can no longer run useful
circuits without a DBSP runtime. All tests and tutorials that used
RootCircuit::build have been upgraded to using
Runtime::init_circuit(1,...).
For backward compatibility, we create a tokio runtime with
the number of threads equal to the number of worker threads by default.
We will increase the ration to two after initial user testing.
The number of workers is configurable via a new dev tweak.

Describe Manual Test Plan

Observed improved performance on ingest-heavy customer workloads.

Checklist

Unit tests added/updated

Added a test for background merge panic detection. Other existing tests should cover the new functionality.

Integration tests added/updated
Documentation updated
Changelog updated

Breaking Changes?

Mark if you think the answer is yes for any of these components:

OpenAPI / REST HTTP API / feldera-types / manager (What is a breaking change?)
Feldera SQL (Syntax, Semantics)
feldera-sqllib (incl. dependencies fxp, etc.) (What is a breaking change?)
Python SDK (What is a breaking change?)
fda (CLI arguments)
Adapters (including configuration)
Storage Format / Checkpoints
Others (specify)

Describe Incompatible Changes

Make the benchmark more realistic by running it with the storage backend enabled. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>

mihaibudiu · 2026-03-17T00:45:22Z

What happens to profile information for background threads? How is that associated with workers?

ryzhyk · 2026-03-17T00:47:54Z

What happens to profile information for background threads? How is that associated with workers?

We don't have profiles for background threads. Merger-related metrics show up as part of Z-1 and Accumulator operators (which contain spines inside). These metrics don't change in any way with this PR.

mihaibudiu · 2026-03-17T00:50:57Z

How about the cache measurements? These are assigned to foreground and background.

mihaibudiu · 2026-03-17T00:56:02Z

Does this fix #2117?

ryzhyk · 2026-03-17T00:58:32Z

How about the cache measurements? These are assigned to foreground and background.

That still holds as well

ryzhyk · 2026-03-17T00:59:37Z

Does this fix #2117?

I think it does, although it doesn't yet eliminate all the special cases in the code.

mihaibudiu

I am approving, but this is not a definitive review.

crates/dbsp/src/trace/spine_async.rs

crates/dbsp/src/profile.rs

crates/dbsp/src/trace/test.rs

mihaibudiu · 2026-03-17T01:07:33Z

crates/dbsp/src/circuit/runtime.rs

 thread_local! {
-    // Reference to the `Runtime` that manages this worker thread or `None`
-    // if the current thread is not running in a multithreaded runtime.
+    /// Reference to the `Runtime` that manages this worker thread or `None`


is this still possible?

yes, there are for example connector threads that run outside a runtime.

crates/dbsp/src/trace/spine_async.rs

gz

I expected it to be more complicated, I guess the fact that it isn't is a positive note for our code

gz · 2026-03-17T04:22:28Z

crates/dbsp/src/circuit/dbsp_handle.rs

+
+    /// The number of merger threads.
+    ///
+    /// The default is twice the number of worker threads.


Changing the default has potential for breaking things, we should ultimately increase it but maybe we start with the same default and roll it out gradually?

I agree it's a breaking change. What would a gradual rollout look like though?

Increase for a few users where we know it will matter, afterwards change default when deemed fine?

maybe too cautious but then if it saves us some hours of trouble, maybe worth it

ok, we can do that.

crates/dbsp/src/circuit/dbsp_handle.rs

crates/dbsp/src/circuit/runtime.rs

gz · 2026-03-17T04:27:00Z

crates/dbsp/src/circuit/runtime.rs

+                    set_current_thread_type(ThreadType::Background);
+                    RUNTIME.with(|rt| *rt.borrow_mut() = Some(runtime_clone.clone()));
+                })
+                .thread_stack_size(6 * 1024 * 1024)


is this a constant we use somewhere else, should it be const ...

I copied this from our other tokio runtimes. Not sure how we chose this size. I think there were some case when the default was too small.

gz · 2026-03-17T04:27:51Z

crates/dbsp/src/circuit/runtime.rs

+        if let Some(rt) = Runtime::runtime() {
+            match current_thread_type() {
+                None => {
+                    let buffer_cache = NO_RUNTIME_CACHE.clone();


this thing is a bit of a ticking time bomb, I did the same for slabs.. we should just get rid of this No-runtime code ideally... lets file an issue

we already have one #2117

With this PR, we no longer have threads that use spines outside the runtime. But this cache is also used by output connector threads, which run inside the runtime, but are neither fg nor bg threads, to buffer updates. It's not clear how to pick cache size or the number of caches for this purpose.

how are the output connector threads using the buffer cache :O? can we add a comment about it somewhere next to NO_RUNTIME_CACHE

how are the output connector threads using the buffer cache :O?

some of them maintain output buffers, which are implemented as spines.

can we add a comment about it somewhere next to NO_RUNTIME_CACHE

ok!

blp

I thought switching to Tokio for this would be awful but it's not too bad.

Using Tokio will make it easier if we decide we want to do async I/O later.

blp · 2026-03-17T06:50:15Z

crates/dbsp/benches/input_map_ingest.rs

+                StorageOptions {
+                    min_storage_bytes: None,
+                    min_step_storage_bytes: None,
+                    ..StorageOptions::default()
+                },


I think these are all the defaults, so I'd just write StorageOptions::default().

min_storage_bytes: None

I don't think this one is the default.

Really? The default for StorageOptions comes from #[derive(Default)] and we also use #[serde(default)] on the struct, so regardless of where the StorageOptions is getting defaulted, any Option field is going to be None.

(But it's not important.)

crates/dbsp/src/circuit/dbsp_handle.rs

blp · 2026-03-17T06:59:10Z

crates/dbsp/src/circuit/runtime.rs

+// Map CPU IDs to core IDs for foreground and background workers.
+//
+// Returns a pair of vectors of core IDs, one for foreground workers and one for background workers.


Suggested change

// Map CPU IDs to core IDs for foreground and background workers.

//

// Returns a pair of vectors of core IDs, one for foreground workers and one for background workers.

/// Map CPU IDs to core IDs for foreground and background workers.

///

/// Returns a pair of vectors of core IDs, one for foreground workers and one for background workers.

crates/dbsp/src/circuit/dbsp_handle.rs

crates/dbsp/src/circuit/runtime.rs

crates/dbsp/src/trace/spine_async.rs

crates/dbsp/src/trace/test.rs

To date, we assigned a single background merger thread to each foreground worker. This may not be sufficient for some workloads or some phases workloads, leading to merge backpressure and slow key lookups due to a large number of unmerged batches. This commit decouples foreground and background worker counts, making it possible to spread merging across any number of background worker threads (we sethe bg/fg worker ratio to 2 by default). Design: - We associate a separate tokio runtime dedicated to merging with a DBSP runtime. - For every spine created by a foreground thread, we start a separate tokio task per spine level. The task runs in an infinite loop making progress with merges at its level. As before we use fuel to control the amount of CPU time at each level. The task yields the CPU after using up its fuel (at this point tokio should put it at the end of the scheduling queue and schedule other tasks). The task blocks when there is no outstanding merge at its level. - As a side effect of the new design, we can no longer run useful circuits without a DBSP runtime. All tests and tutorials that used `RootCircuit::build` have been upgraded to using `Runtime::init_circuit(1,...)`. - For backward compatibility, we create a tokio runtime with the number of threads equal to the number of worker threads by default. We will increase the ration to two after initial user testing. The number of workers is configurable via a new dev tweak. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>

[dbsp] Enable storage in the input_map benchmark.

5c5af1f

Make the benchmark more realistic by running it with the storage backend enabled. Signed-off-by: Leonid Ryzhyk <ryzhyk@gmail.com>

ryzhyk requested review from blp and gz March 17, 2026 00:42

ryzhyk added DBSP core Related to the core DBSP library performance labels Mar 17, 2026

mihaibudiu approved these changes Mar 17, 2026

View reviewed changes

mythical-fred approved these changes Mar 17, 2026

View reviewed changes

crates/dbsp/src/trace/spine_async.rs Outdated Show resolved Hide resolved

crates/dbsp/src/trace/spine_async.rs Outdated Show resolved Hide resolved

gz approved these changes Mar 17, 2026

View reviewed changes

ryzhyk force-pushed the async_merger branch 3 times, most recently from d580f68 to 4f46b3c Compare March 17, 2026 07:12

blp approved these changes Mar 17, 2026

View reviewed changes

ryzhyk temporarily deployed to ci March 17, 2026 07:30 — with GitHub Actions Inactive

ryzhyk force-pushed the async_merger branch from 4f46b3c to 407e684 Compare March 17, 2026 16:00

ryzhyk force-pushed the async_merger branch from 407e684 to e8acd6a Compare March 17, 2026 17:22

ryzhyk temporarily deployed to ci March 17, 2026 17:42 — with GitHub Actions Inactive

ryzhyk added this pull request to the merge queue Mar 17, 2026

Merged via the queue into main with commit e0a5624 Mar 17, 2026
30 of 36 checks passed

ryzhyk deleted the async_merger branch March 17, 2026 23:32

Conversation

ryzhyk commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe Manual Test Plan

Checklist

Breaking Changes?

Describe Incompatible Changes

Uh oh!

mihaibudiu commented Mar 17, 2026

Uh oh!

ryzhyk commented Mar 17, 2026

Uh oh!

mihaibudiu commented Mar 17, 2026

Uh oh!

mihaibudiu commented Mar 17, 2026

Uh oh!

ryzhyk commented Mar 17, 2026

Uh oh!

ryzhyk commented Mar 17, 2026

Uh oh!

mihaibudiu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gz Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gz Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ryzhyk Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

blp left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ryzhyk commented Mar 17, 2026 •

edited

Loading

gz Mar 17, 2026 •

edited

Loading

gz Mar 17, 2026 •

edited

Loading

ryzhyk Mar 17, 2026 •

edited

Loading