A more scalable buffer cache by gz · Pull Request #5788 · feldera/feldera

gz · 2026-03-10T06:06:41Z

storage: add s3-fifo buffer cache, more config options

The buffer cache used to be simple. A single mutex protected it.
That design worked because only one thread accessed the cache.

We now want multiple threads to run merges in parallel.
That requires a cache that many threads can access without collapsing under contention.

This change introduces a new multi-threaded buffer cache. It also switches adds a
supposedly better eviction policy, the S3-FIFO algorithm.

For compatibility, this change also adds configuration flags to revert the behavior:

"dev_tweaks": {
        "buffer_cache_allocation_strategy": "per_thread" | "global" | "shared_per_worker_pair",
        "buffer_cache_strategy": "s3_fifo" | "lru"
      },
    
    // new defaults: s3_fifo AND shared_per_worker_pair
    // previously: lru AND per_thread

Describe Manual Test Plan

Ran a few pipelines, wrote lots of tests and benchmark programs.

Checklist

Unit tests added/updated

Breaking Changes?

Potential for performance regressions, since it changes a critical piece of our infra. Benchmarks are promising though.
We can revert back to the old cache with a dev-tweak.

mythical-fred

Two issues to resolve before merge.

crates/buffer-cache/Cargo.toml

crates/dbsp/src/circuit/runtime.rs

crates/dbsp/benches/buffer_cache/common.rs

crates/dbsp/benches/buffer_cache.rs

crates/dbsp/src/circuit/dbsp_handle.rs

crates/dbsp/src/storage/buffer_cache/cache.rs

crates/buffer-cache/src/lib.rs

blp · 2026-03-11T21:14:49Z

@gz If you can put more of the PR description into the commit messages, then that would be great (the PR description is very good). I understand that the graphs, etc. wouldn't be able to go in there.

crates/dbsp/src/circuit/runtime.rs

crates/dbsp/src/storage/buffer_cache/cache.rs

crates/buffer-cache/src/lib.rs

gz · 2026-03-12T18:55:52Z

New benchmarks for s3-fifo eviction; still promising:

ST

MT

mythical-fred

LGTM. is in. I was wrong about the memory default — old code was also 256 MiB per thread instance, so the total is unchanged. Don't forget to pre-create on crates.io with Trusted Publishing set up (both and workflow entries) before the release runs.

lalithsuresh · 2026-03-12T19:53:37Z

@gz these numbers look weaker than sieve right? What changed?

mythical-fred

Three minor nits below — none are blockers. Approving.

crates/buffer-cache/src/builder.rs

crates/buffer-cache/src/s3_fifo.rs

crates/buffer-cache/src/lib.rs

mythical-fred

Two blockers, see inline.

crates/buffer-cache/src/lru.rs

crates/dbsp/src/storage/buffer_cache/cache.rs

crates/buffer-cache/Cargo.toml

gz · 2026-03-12T23:18:39Z

@gz these numbers look weaker than sieve right? What changed?

the absolute hit rate is lower because in this case I used a smaller cache-size to regenerate the graph
I regenerated the graphs earlier with sieve in it and s3-fifo was matching the perf of sieve for zipf distribution

for scalability, I think (not confirmed) s3-fifo is weaker because the quick_cache underneath uses a "regular RwLock" rather than a sharded rwlock. I did open an issue about it: arthurprs/quick-cache#108 can probably fix it myself but lets see what the maintainer says about it. This is the problem I also had with the sieve cache prototype initially.

gz · 2026-03-12T23:31:48Z

$some-workload 16GiB CACHE

LRU:
    "buffer_cache_strategy": "lru"
foreground_cache_hit_rate_percent	78.6%	79.3%	79.3%	78.5%	79.2%	79.5%	79.4%	79.4%	78.5%	79.5%
background_cache_hit_rate_percent	71.1%	66.5%	63.8%	77.0%	68.2%	69.5%	67.7%	65.0%	63.8%	77.0%

s3_fifo:
    "buffer_cache_strategy": "s3_fifo"
    "buffer_cache_allocation_strategy": "per_thread"
foreground_cache_hit_rate_percent	78.7%	78.4%	78.4%	78.7%	77.7%	78.0%	78.2%	78.3%	77.7%	78.7%
background_cache_hit_rate_percent	36.7%	36.3%	36.1%	36.9%	36.9%	35.9%	36.1%	36.9%	35.9%	36.9%

    "buffer_cache_strategy": "s3_fifo",
    "buffer_cache_allocation_strategy": "global"
foreground_cache_hit_rate_percent	94.9%	95.4%	94.8%	95.0%	94.7%	95.4%	95.0%	94.9%	94.7%	95.4%
background_cache_hit_rate_percent	76.1%	77.0%	72.6%	75.7%	75.1%	75.2%	73.8%	77.8%	72.6%	77.8%

    "buffer_cache_strategy": "s3_fifo"
    "buffer_cache_allocation_strategy": "shared_per_worker_pair"
foreground_cache_hit_rate_percent	95.6%	96.0%	96.5%	94.6%	96.3%	97.0%	97.4%	96.6%	94.6%	97.4%
background_cache_hit_rate_percent	77.8%	79.8%	79.4%	77.0%	80.2%	79.6%	78.5%	78.4%	77.0%	80.2%

    "buffer_cache_strategy": "s3_fifo"
    "buffer_cache_allocation_strategy": "shared_per_worker_pair"
    "merger": "push_merger"
foreground_cache_hit_rate_percent	97.4%	97.9%	97.8%	98.5%	97.9%	97.4%	97.7%	97.5%	97.4%	98.5%
background_cache_hit_rate_percent	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%

$some-workload cache_mib: null, 8 workers = 8*256*2 (so set to 4096 MiB)

    "buffer_cache_strategy": "lru"
foreground_cache_hit_rate_percent	80.0%	78.8%	79.1%	79.2%	79.6%	78.9%	78.8%	79.0%	78.8%	80.0%
background_cache_hit_rate_percent	21.2%	21.6%	23.6%	21.9%	24.8%	22.2%	22.2%	22.4%	21.2%	24.8%

    "buffer_cache_strategy": "s3_fifo"
    "buffer_cache_allocation_strategy": "per_thread"
foreground_cache_hit_rate_percent	78.9%	79.3%	78.9%	79.0%	78.9%	79.4%	79.5%	78.8%	78.8%	79.5%
background_cache_hit_rate_percent	8.4%	8.6%	8.3%	8.2%	8.6%	9.2%	8.5%	8.6%	8.2%	9.2%

    "buffer_cache_strategy": "s3_fifo",
    "buffer_cache_allocation_strategy": "global"
foreground_cache_hit_rate_percent	80.3%	80.4%	80.3%	80.2%	79.8%	80.3%	81.1%	79.2%	79.2%	81.1%
background_cache_hit_rate_percent	21.0%	20.6%	21.5%	19.7%	20.9%	23.6%	23.9%	21.2%	19.7%	23.9%

    "buffer_cache_strategy": "s3_fifo",
    "buffer_cache_allocation_strategy": "shared_per_worker_pair",
foreground_cache_hit_rate_percent	80.0%	80.5%	80.4%	80.1%	80.4%	80.0%	79.7%	79.9%	79.7%	80.5%
background_cache_hit_rate_percent	20.3%	22.7%	21.4%	21.2%	21.1%	22.2%	22.6%	23.0%	20.3%	23.0%

    "buffer_cache_allocation_strategy": "shared_per_worker_pair",
    "buffer_cache_strategy": "s3_fifo",
    "merger": "push_merger"
foreground_cache_hit_rate_percent	81.8%	82.0%	82.1%	81.7%	81.3%	81.6%	81.6%	81.7%	81.3%	82.1%
background_cache_hit_rate_percent	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%	100.0%


$some-workload cache_mib: null, 1 workers = 256*2 (so set to 512 MiB)

    "buffer_cache_strategy": "lru"
foreground_cache_hit_rate_percent	77.4%	77.4%	77.4%
background_cache_hit_rate_percent	0.8%	0.8%	0.8%

    "buffer_cache_strategy": "s3_fifo",
    "buffer_cache_allocation_strategy": "per_thread"
foreground_cache_hit_rate_percent	72.6%	72.6%	72.6%
background_cache_hit_rate_percent	5.8%	5.8%	5.8%

    "buffer_cache_strategy": "s3_fifo",
    "buffer_cache_allocation_strategy": "global"
foreground_cache_hit_rate_percent	70.0%	70.0%	70.0%
background_cache_hit_rate_percent	14.5%	14.5%	14.5%

    "buffer_cache_strategy": "s3_fifo",
    "buffer_cache_allocation_strategy": "shared_per_worker_pair",
foreground_cache_hit_rate_percent	72.6%	72.6%	72.6%
background_cache_hit_rate_percent	21.4%	21.4%	21.4%

    "buffer_cache_allocation_strategy": "shared_per_worker_pair",
    "buffer_cache_strategy": "s3_fifo",
    "merger": "push_merger"
foreground_cache_hit_rate_percent	73.5%	73.5%	73.5%
background_cache_hit_rate_percent	100.0%	100.0%	100.0%


 u64Njoin-no-match

    "buffer_cache_strategy": "lru"
 foreground_cache_hit_rate_percent	99.2%	98.9%	99.1%	99.0%	99.3%	99.2%	99.0%	99.2%	98.9%	99.3%
background_cache_hit_rate_percent	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%	0.0%
 
    "buffer_cache_allocation_strategy": "shared_per_worker_pair",
    "buffer_cache_strategy": "sieve"
foreground_cache_hit_rate_percent	45.4%	39.3%	48.3%	50.2%	42.5%	47.2%	44.3%	46.5%	39.3%	50.2%
background_cache_hit_rate_percent	10.7%	11.0%	11.3%	12.7%	13.4%	12.2%	11.6%	11.7%	10.7%	13.4%

    "buffer_cache_allocation_strategy": "shared_per_worker_pair",
    "buffer_cache_strategy": "s3_fifo"
foreground_cache_hit_rate_percent	96.0%	95.4%	94.9%	95.7%	94.9%	94.9%	95.1%	95.9%	94.9%	96.0%
background_cache_hit_rate_percent	21.7%	20.7%	20.3%	21.2%	19.4%	12.2%	14.1%	13.3%	12.2%	21.7%

    "buffer_cache_allocation_strategy": "global",
    "buffer_cache_strategy": "s3_fifo"
foreground_cache_hit_rate_percent	67.0%	59.0%	50.3%	72.0%	53.8%	59.2%	62.4%	60.6%	50.3%	72.0%
background_cache_hit_rate_percent	22.1%	32.2%	31.7%	22.3%	21.9%	29.3%	27.5%	20.8%	20.8%	32.2%

    "buffer_cache_allocation_strategy": "global",
    "buffer_cache_strategy": "sieve"
foreground_cache_hit_rate_percent	28.4%	28.5%	20.6%	24.2%	26.0%	21.1%	20.4%	23.8%	20.4%	28.5%
background_cache_hit_rate_percent	11.9%	12.3%	11.0%	13.2%	12.9%	11.8%	9.9%	11.3%	9.9%	13.2%

here are some metrics from pipelines

the one interesting takeaway was that for a ingest heavy workload u64Njoin-no-match, having a global cache is worse (in s3_fifo) than having a shared_per_worker_pair

mythical-fred

Retracting the prior REQUEST_CHANGES — publish = true is already in the Cargo.toml (I was wrong). One non-blocking nit below. LGTM.

crates/buffer-cache/src/lru.rs

blp

I read most of this in detail (not all of the tests, and not some of the code that just moved) and it's good work. I especially appreciate how many tests it adds, and the benchmark.

None of my suggestions are important.

crates/buffer-cache/src/tests/lru.rs

crates/buffer-cache/src/lib.rs

crates/buffer-cache/src/lru.rs

crates/buffer-cache/src/s3_fifo.rs

crates/dbsp/benches/buffer_cache.rs

The buffer cache used to be simple. A single mutex protected it. That design worked because only one thread accessed the cache. We now want multiple threads to run merges in parallel. That requires a cache that many threads can access without collapsing under contention. This change introduces a new multi-threaded buffer cache. It also switches adds a supposedly better eviction policy, the S3-FIFO algorithm. For compatibility, this change also adds configuration flags to revert the behavior: ``` "dev_tweaks": { "buffer_cache_allocation_strategy": "per_thread" | "global" | "shared_per_worker_pair", "buffer_cache_strategy": "s3_fifo" | "lru" }, // new defaults: s3_fifo AND shared_per_worker_pair // previously: lru AND per_thread ``` Signed-off-by: Gerd Zellweger <mail@gerdzellweger.com>

gz requested a review from blp March 10, 2026 06:06

mythical-fred suggested changes Mar 10, 2026

View reviewed changes

crates/buffer-cache/Cargo.toml Show resolved Hide resolved

crates/dbsp/src/circuit/runtime.rs Outdated Show resolved Hide resolved

gz force-pushed the sieve branch from 341f745 to 2506178 Compare March 10, 2026 21:08

mihaibudiu approved these changes Mar 10, 2026

View reviewed changes

gz force-pushed the sieve branch 3 times, most recently from fac284f to 9137a0f Compare March 10, 2026 22:47

This comment was marked as outdated.

Sign in to view

gz marked this pull request as draft March 11, 2026 04:12

This comment was marked as outdated.

Sign in to view

gz force-pushed the sieve branch from 9137a0f to 7d8a38c Compare March 11, 2026 05:18

This comment was marked as outdated.

Sign in to view

blp reviewed Mar 11, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

gz force-pushed the sieve branch 8 times, most recently from bdc4f4d to ffdefb1 Compare March 12, 2026 17:21

gz marked this pull request as ready for review March 12, 2026 17:25

gz requested a review from mythical-fred March 12, 2026 18:11

mythical-fred approved these changes Mar 12, 2026

View reviewed changes

crates/buffer-cache/src/builder.rs Outdated Show resolved Hide resolved

crates/buffer-cache/src/s3_fifo.rs Outdated Show resolved Hide resolved

crates/buffer-cache/src/lib.rs Show resolved Hide resolved

gz force-pushed the sieve branch 2 times, most recently from cb6acef to 96d3439 Compare March 12, 2026 20:38

mythical-fred suggested changes Mar 12, 2026

View reviewed changes

gz force-pushed the sieve branch 2 times, most recently from 4e7bcc3 to d6a59bb Compare March 12, 2026 23:09

gz requested a review from blp March 12, 2026 23:32

mythical-fred approved these changes Mar 12, 2026

View reviewed changes

crates/buffer-cache/src/lru.rs Show resolved Hide resolved

gz force-pushed the sieve branch from d6a59bb to a3a417c Compare March 14, 2026 03:36

blp approved these changes Mar 16, 2026

View reviewed changes

gz force-pushed the sieve branch 2 times, most recently from 654d787 to 23e4c36 Compare March 16, 2026 19:26

gz enabled auto-merge March 16, 2026 19:26

gz force-pushed the sieve branch from 23e4c36 to 9405bc7 Compare March 16, 2026 20:06

gz added this pull request to the merge queue Mar 16, 2026

Merged via the queue into main with commit 1f4159b Mar 16, 2026
1 check passed

gz deleted the sieve branch March 16, 2026 23:08

Conversation

gz commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe Manual Test Plan

Checklist

Breaking Changes?

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

blp commented Mar 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

gz commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ST

MT

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

lalithsuresh commented Mar 12, 2026

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gz commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gz commented Mar 12, 2026

Uh oh!

mythical-fred left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

blp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gz commented Mar 10, 2026 •

edited

Loading

gz commented Mar 12, 2026 •

edited

Loading

gz commented Mar 12, 2026 •

edited

Loading