Redfs ubuntu hwe 6.17.0 16.16 24.04.1 by hbirth · Pull Request #101 · DDNStorage/linux

hbirth · 2026-02-18T19:35:14Z

This is a checkout of ubuntu-hwe-6.17.0-16.16 from the ubuntu repository and all our patches applied.
After resolving all the conflicts I had to fix some code that was semantically wrong since the kernel has moved a lot since the patches were created.

This code has survived xfstests with /dev/fuse and io-uring so far.

hbirth · 2026-02-18T19:39:02Z

The clone from the ubuntu repository is also in here as a separate branch, in case we have to check back some time.

bsbernd · 2026-02-18T20:15:28Z

@hbirth I think the last patch needs to be integrated into the other causing. Patch and I'm confused about things like 396b209 fuse: add simple request tracepoints, because

git tag --contains shows that it is part of 6.12 - already merged and not needed.

hbirth · 2026-02-18T20:25:37Z

@hbirth I think the last patch needs to be integrated into the other causing. Patch and I'm confused about things like 396b209 fuse: add simple request tracepoints, because

git tag --contains shows that it is part of 6.12 - already merged and not needed.

I thought you had a script to check if all patches are there, so I even kept the sequence the same.

bsbernd · 2026-02-18T20:42:20Z

@hbirth I think the last patch needs to be integrated into the other causing. Patch and I'm confused about things like 396b209 fuse: add simple request tracepoints, because
git tag --contains shows that it is part of 6.12 - already merged and not needed.

I thought you had a script to check if all patches are there, so I even kept the sequence the same.

There are too many exceptions already - we need to create a whitelist per branch.

bsbernd · 2026-02-18T20:43:34Z

And for things that are merged upstream and have the cherry-pick ID in our 6.8 branch - script already handles it. The order does not matter, it just needs to be there.

hbirth · 2026-02-18T22:03:55Z

And for things that are merged upstream and have the cherry-pick ID in our 6.8 branch - script already handles it. The order does not matter, it just needs to be there.

OK, so I shouldn't have applied those and fixed later?

bsbernd · 2026-02-18T22:24:04Z

Yeah, I think the branch should only get patches that are absolutely needed. I had made an exception for the el9.6 branch and applied an empty patch to make the script happy. But in the end, all the other branches still show differences - we need to introduce a white list.

hbirth · 2026-02-19T07:14:49Z

OK, if I do that, the patches will still not apply cleanly, so I have to fix ... sometimes it's alomst a rewrite since the context changed. That's why I fixed all that later. So now I should create separate patches for that, or stop for every patch and compile and test? ... I will not know if after I have applied a patch ... even if it applies cleanly, the result is the correct functionality (e.g. iomap and DLM_LOCK the logic changed in the write iterator)

My problem is, that I will need fixes. Where do they go? Since you don't want a separate patch only for the fixes.

I thought it would be easier if we have the series of patches and then one where we fix the kernel and then we know if a line was from a wrong fix, or we missed it and it was from the original patch, that did not actually apply to this version.

BTW, if I had to do a 6.19 .. I would omit the last patch but keep all the rest, and check afterwards how much of the fix applies.

hbirth · 2026-02-19T07:45:01Z

filtered the patches that were already in Ubuntu-hwe-6.17

hbirth · 2026-02-19T13:57:23Z

if we have to do this multiple times, we should think about squashing OUR changes to a couple of topics: page pinning, io-uring fixes, DLM_LOCK, etc.
Having code added just to change it later ... and even to a version that is already upsteam is really error prone.
I'm referring especially to the compounds, which have to be updated to the upstream version

fs/fuse/dev_uring.c

bsbernd · 2026-02-19T15:28:28Z

if we have to do this multiple times, we should think about squashing OUR changes to a couple of topics: page pinning, io-uring fixes, DLM_LOCK, etc. Having code added just to change it later ... and even to a version that is already upsteam is really error prone. I'm referring especially to the compounds, which have to be updated to the upstream version

There is a good reason why upstream first is preferred ;) Keeps things in management agenda and avoids such changes. I think for upstream changes it makes sense to revert these internal patches and then to apply what upstream eventually gets.

bsbernd · 2026-02-19T15:31:45Z

fs/fuse/file.c

 	args->out_args[0].size = count;
 }

-static void fuse_release_user_pages(struct fuse_args_pages *ap, ssize_t nres,


2nd patch is not right, I had only done that in the 6.8 branch because it had introduced bad conflicts. Should not be needed for 6.17

fs/fuse/dev.c

fs/fuse/dev_uring.c

fs/fuse/fuse_dlm_cache.c

This is especially needed for better ftrace analysis, for example to build histograms. So far the request unique was missing, because it was added after the first trace message. IDs/req-unique now might not come up perfectly sequentially anymore, but especially with cloned device or io-uring this did not work perfectly anyway. Signed-off-by: Bernd Schubert <bschubert@ddn.com>

fuse_uring_send_next_to_ring() can just call into fuse_uring_send and avoid code dup. Signed-off-by: Bernd Schubert <bschubert@ddn.com>

Rename trace_fuse_request_send to trace_fuse_request_enqueue Add trace_fuse_request_send Add trace_fuse_request_bg_enqueue Add trace_fuse_request_enqueue This helps to track entire request time and time in different queues. Signed-off-by: Bernd Schubert <bschubert@ddn.com>

This is preparation for follow up commits that allow to run with a reduced number of queues. Signed-off-by: Bernd Schubert <bschubert@ddn.com>

Add per-CPU and per-NUMA node bitmasks to track which io-uring queues are registered. Signed-off-by: Bernd Schubert <bschubert@ddn.com>

Queues selection (fuse_uring_get_queue) can handle reduced number queues - using io-uring is possible now even with a single queue and entry. The FUSE_URING_REDUCED_Q flag is being introduce tell fuse server that reduced queues are possible, i.e. if the flag is set, fuse server is free to reduce number queues. Signed-off-by: Bernd Schubert <bschubert@ddn.com>

Running background IO on a different core makes quite a difference. fio --directory=/tmp/dest --name=iops.\$jobnum --rw=randread \ --bs=4k --size=1G --numjobs=1 --iodepth=4 --time_based\ --runtime=30s --group_reporting --ioengine=io_uring\ --direct=1 unpatched READ: bw=272MiB/s (285MB/s) ... patched READ: bw=650MiB/s (682MB/s) Reason is easily visible, the fio process is migrating between CPUs when requests are submitted on the queue for the same core. With --iodepth=8 unpatched READ: bw=466MiB/s (489MB/s) patched READ: bw=641MiB/s (672MB/s) Without io-uring (--iodepth=8) READ: bw=729MiB/s (764MB/s) Without fuse (--iodepth=8) READ: bw=2199MiB/s (2306MB/s) (Test were done with <libfuse>/example/passthrough_hp -o allow_other --nopassthrough \ [-o io_uring] /tmp/source /tmp/dest ) Additional notes: With FURING_NEXT_QUEUE_RETRIES=0 (--iodepth=8) READ: bw=903MiB/s (946MB/s) With just a random qid (--iodepth=8) READ: bw=429MiB/s (450MB/s) With --iodepth=1 unpatched READ: bw=195MiB/s (204MB/s) patched READ: bw=232MiB/s (243MB/s) With --iodepth=1 --numjobs=2 unpatched READ: bw=366MiB/s (384MB/s) patched READ: bw=472MiB/s (495MB/s) With --iodepth=1 --numjobs=8 unpatched READ: bw=1437MiB/s (1507MB/s) patched READ: bw=1529MiB/s (1603MB/s) fuse without io-uring READ: bw=1314MiB/s (1378MB/s), 1314MiB/s-1314MiB/s ... no-fuse READ: bw=2566MiB/s (2690MB/s), 2566MiB/s-2566MiB/s ... In summary, for async requests the core doing application IO is busy sending requests and processing IOs should be done on a different core. Spreading the load on random cores is also not desirable, as the core might be frequency scaled down and/or in C1 sleep states. Not shown here, but differnces are much smaller when the system uses performance govenor instead of schedutil (ubuntu default). Obviously at the cost of higher system power consumption for performance govenor - not desirable either. Results without io-uring (which uses fixed libfuse threads per queue) heavily depend on the current number of active threads. Libfuse uses default of max 10 threads, but actual nr max threads is a parameter. Also, no-fuse-io-uring results heavily depend on, if there was already running another workload before, as libfuse starts these threads dynamically - i.e. the more threads are active, the worse the performance. Signed-off-by: Bernd Schubert <bschubert@ddn.com>

This is to further improve performance. fio --directory=/tmp/dest --name=iops.\$jobnum --rw=randread \ --bs=4k --size=1G --numjobs=1 --iodepth=4 --time_based\ --runtime=30s --group_reporting --ioengine=io_uring\ --direct=1 unpatched READ: bw=650MiB/s (682MB/s) patched: READ: bw=995MiB/s (1043MB/s) with --iodepth=8 unpatched READ: bw=641MiB/s (672MB/s) patched READ: bw=966MiB/s (1012MB/s) Reason is that with --iodepth=x (x > 1) fio submits multiple async requests and a single queue might become CPU limited. I.e. spreading the load helps.

generic_file_direct_write() also does this and has a large comment about. Reproducer here is xfstest's generic/209, which is exactly to have competing DIO write and cached IO read. Signed-off-by: Bernd Schubert <bschubert@ddn.com>

This was done as condition on direct_io_allow_mmap, but I believe this is not right, as a file might be open two times - once with write-back enabled another time with FOPEN_DIRECT_IO. Signed-off-by: Bernd Schubert <bschubert@ddn.com>

With the reduced queue feature io-uring is marked as ready after receiving the 1st ring entry. At this time other queues just might be in the process of registration and then a race happens fuse_uring_queue_fuse_req -> no queue entry registered yet list_add_tail -> fuse request gets queued So far fetching requests from the list only happened from FUSE_IO_URING_CMD_COMMIT_AND_FETCH, but without new requests on the same queue, it would actually never send requests from that queue - the request was stuck.

fuse.h: add new opcode FUSE_COMPOUND fuse_compound.c: add new functionality to pack multiple fuse operations into one compound command file.c: add an implementation of open+getattr Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>

There was a race between fuse_uring_cancel() and fuse_uring_register()/fuse_uring_next_fuse_req(), which comes from the queue reduction feature. Race was core-A core-B fuse_uring_register spin_lock(&queue->lock); fuse_uring_ent_avail() spin_unlock(&queue->lock); fuse_uring_cancel() spin_lock(&queue->lock); ent->state = FRRS_USERSPACE; list_move() fuse_uring_next_fuse_req() spin_lock(&queue->lock); fuse_uring_ent_avail(ent, queue); fuse_uring_send_next_to_ring() spin_unlock(&queue->lock); fuse_uring_send_next_to_ring I.e. fuse_uring_ent_avail() was called two times and the 2nd time when the entry was actually already handled by fuse_uring_cancel(). Solution is to not call fuse_uring_ent_avail() from fuse_uring_register. With that the entry is not in state FRRS_AVAILABLE and fuse_uring_cancel() will not touch it. fuse_uring_send_next_to_ring() will mark it as FRRS_AVAILABLE, and then either assign a request to it and change state again or will not touch it at all anymore - race fixed. This will be folded into the upstream queue reduction patches and therefore has the RED-34640 commit message. Also entirely removed is fuse_uring_do_register() as remaining work can be done by the caller. Signed-off-by: Bernd Schubert <bschubert@ddn.com>

This is just to avoid code dup with an upcoming commit. Signed-off-by: Bernd Schubert <bschubert@ddn.com>

This issue could be observed sometimes during libfuse xfstests, from dmseg prints some like "kernel: WARNING: CPU: 4 PID: 0 at fs/fuse/dev_uring.c:204 fuse_uring_destruct+0x1f5/0x200 [fuse]". The cause is, if when fuse daemon just submitted FUSE_IO_URING_CMD_REGISTER SQEs, then umount or fuse daemon quits at this very early stage. After all uring queues stopped, might have one or more unprocessed FUSE_IO_URING_CMD_REGISTER SQEs get processed then some new ring entities are created and added to ent_avail_queue, and immediately fuse_uring_cancel moved them to ent_in_userspace after SQEs get canceled. These ring entities were not moved to ent_released, and stayed in ent_in_userspace when fuse_uring_destruct was called. One way to solve it would be to also free 'ent_in_userspace' in fuse_uring_destruct(), but from code point of view it is hard to see why it is needed. As suggested by Joanne, another solution is to avoid moving entries in fuse_uring_cancel() to the 'ent_in_userspace' list and just releasing them directly. Fixes: b6236c8 ("fuse: {io-uring} Prevent mount point hang on fuse-server termination") Cc: Joanne Koong <joannelkoong@gmail.com> Cc: <stable@vger.kernel.org> # v6.14 Signed-off-by: Jian Huang Li <ali@ddn.com> Signed-off-by: Bernd Schubert <bschubert@ddn.com>

When a request is terminated before it has been committed, the request is not removed from the queue's list. This leaves a dangling list entry that leads to list corruption and use-after-free issues. Remove the request from the queue's list for terminated non-committed requests. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Fixes: c090c8a ("fuse: Add io-uring sqe commit and fetch support") Cc: stable@vger.kernel.org Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

This fixes a memory leak.

FUSE_INIT has always been asynchronous with mount. That means that the server processed this request after the mount syscall returned. This means that FUSE_INIT can't supply the root inode's ID, hence it currently has a hardcoded value. There are other limitations such as not being able to perform getxattr during mount, which is needed by selinux. To remove these limitations allow server to process FUSE_INIT while initializing the in-core super block for the fuse filesystem. This can only be done if the server is prepared to handle this, so add FUSE_DEV_IOC_SYNC_INIT ioctl, which a) lets the server know whether this feature is supported, returning ENOTTY othewrwise. b) lets the kernel know to perform a synchronous initialization The implementation is slightly tricky, since fuse_dev/fuse_conn are set up only during super block creation. This is solved by setting the private data of the fuse device file to a special value ((struct fuse_dev *) 1) and waiting for this to be turned into a proper fuse_dev before commecing with operations on the device file. Several merge conflicts during the backport - the ioctl handler was easier to rewrite from scratch. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Bernd Schubert <bschubert@ddn.com> (cherry picked from commit dfb84c3)

no functional changes Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>

Take actions on the PR merged event of this repo. Run copy-from-linux-branch.sh and create a PR for redfs. (cherry picked from commit f54872e)

Switch to pull_request_target instead of pull_request as the github security requirement. Also limits the scope to protected PR. (cherry picked from commit b9980ad)

Remove the pull_request_target as it doesn't work. (cherry picked from commit 5328f66)

no functional change Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>

Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>

…) returns some other error than ENOSYS. The returned error got overwritten. Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>

Simplify fuse_compound_req to hold only the pointers to the added fuse args and the request housekeeping. Simplify open+getattr call by using helper functions to fill out the fuse request parameters Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>

For now compounds are a module option and disabled by default Signed-off-by: Bernd Schubert <bschubert@ddn.com>

The use of bitmap_weight() didn't give the actual index, but always returned the current cpu, which resulted in a totally wrong mapping. It now just increases a counter for every mapping and ignores cores not in the given (numa) map and then find the index for that. Also added is a pr_debug(), which can be activated for example with echo "module redfs +p" >/proc/dynamic_debug/control (Pity that upstream is not open for such debug messages).

bsbernd · 2026-02-19T21:29:22Z

fs/fuse/dev_uring.c

 continue;

+ /* memory barrier to ensure we see the latest list state */
+ smp_rmb();


This patch can be equally removed - I was hoping it would fix a race, but it actually didn't.

bsbernd · 2026-02-19T21:32:35Z

fs/fuse/inode.c

+		hlist_for_each_entry(dentry, &inode->i_dentry, d_u.d_alias) {
+			spin_lock(&dentry->d_lock);
+			if (!d_unhashed(dentry))
+				__d_drop(dentry);


I think there is a similar patch from Miklos and Miklos had said this __d_drop() is not right. I need to check again. Keep it for now as it is, but we need to have a look at some point.

bsbernd · 2026-02-19T21:40:22Z

fs/fuse/file.c

+cleanup:
+	unlock_page(page);
+	put_page(page);
+error:


This function doesn't look right. The patch in 6.8 was not touching fuse_vma_close at all - why does it here?

bsbernd · 2026-02-19T21:47:19Z

fs/fuse/file.c

 *ppos = pos;

+ if (res > 0 && write && fopen_direct_io) {
+ /*


Here we could take the upstream patch, with the right cherry-pick ID

hbirth requested a review from bsbernd February 18, 2026 19:35

hbirth requested review from cding-ddn, openunix and yongzech February 18, 2026 19:40

hbirth force-pushed the redfs-ubuntu-hwe-6.17.0-16.16_24.04.1 branch from d1ad060 to 863185e Compare February 18, 2026 19:43

hbirth force-pushed the redfs-ubuntu-hwe-6.17.0-16.16_24.04.1 branch from 863185e to 90a7d83 Compare February 19, 2026 07:44

hbirth force-pushed the redfs-ubuntu-hwe-6.17.0-16.16_24.04.1 branch from 90a7d83 to 8feba64 Compare February 19, 2026 13:54

bsbernd reviewed Feb 19, 2026

View reviewed changes

fs/fuse/dev_uring.c Outdated Show resolved Hide resolved

bsbernd requested changes Feb 19, 2026

View reviewed changes

fs/fuse/dev_uring.c Outdated Show resolved Hide resolved

bsbernd reviewed Feb 19, 2026

View reviewed changes

fs/fuse/dev.c Show resolved Hide resolved

bsbernd reviewed Feb 19, 2026

View reviewed changes

fs/fuse/dev.c Outdated Show resolved Hide resolved

bsbernd reviewed Feb 19, 2026

View reviewed changes

fs/fuse/dev_uring.c Show resolved Hide resolved

bsbernd reviewed Feb 19, 2026

View reviewed changes

fs/fuse/fuse_dlm_cache.c Outdated Show resolved Hide resolved

hbirth force-pushed the redfs-ubuntu-hwe-6.17.0-16.16_24.04.1 branch from 8feba64 to c683e87 Compare February 19, 2026 19:08

bsbernd added 3 commits February 19, 2026 20:09

fuse: {io-uring} Avoid _send code dup

04e68d6

fuse_uring_send_next_to_ring() can just call into fuse_uring_send and avoid code dup. Signed-off-by: Bernd Schubert <bschubert@ddn.com>

hbirth force-pushed the redfs-ubuntu-hwe-6.17.0-16.16_24.04.1 branch from c683e87 to ee96a90 Compare February 19, 2026 19:13

bsbernd and others added 25 commits February 19, 2026 20:30

fuse: {io-uring} Rename ring->nr_queues to max_nr_queues

36a992c

This is preparation for follow up commits that allow to run with a reduced number of queues. Signed-off-by: Bernd Schubert <bschubert@ddn.com>

fuse: {io-uring} Use bitmaps to track registered queues

38e232d

Add per-CPU and per-NUMA node bitmasks to track which io-uring queues are registered. Signed-off-by: Bernd Schubert <bschubert@ddn.com>

fuse: Move ring queues_refs decrement

aa40c1a

This is just to avoid code dup with an upcoming commit. Signed-off-by: Bernd Schubert <bschubert@ddn.com>

fuse: Fix missing numa_q_map free in dev_uring

9a1f6b4

This fixes a memory leak.

fuse: fix includes

b073406

no functional changes Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>

Create workflow the create pr for redfs in each branch

5c60a27

Take actions on the PR merged event of this repo. Run copy-from-linux-branch.sh and create a PR for redfs. (cherry picked from commit f54872e)

Fix the github actions PR trigger

252a9e8

Switch to pull_request_target instead of pull_request as the github security requirement. Also limits the scope to protected PR. (cherry picked from commit b9980ad)

Remove the pull_request_target from actions

fcbf155

Remove the pull_request_target as it doesn't work. (cherry picked from commit 5328f66)

fuse: improve readability and simplify code flow

f62aae1

no functional change Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>

fuse: improve error handling in open+getattr compound

6c30bf1

Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>

fuse: fix error handling for compounds in case fuse_compound_request(…

99b7aa7

…) returns some other error than ENOSYS. The returned error got overwritten. Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>

fuse: simplify compound commands

748d26b

Simplify fuse_compound_req to hold only the pointers to the added fuse args and the request housekeeping. Simplify open+getattr call by using helper functions to fill out the fuse request parameters Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>

fuse: Make compounds a module option

77757d5

For now compounds are a module option and disabled by default Signed-off-by: Bernd Schubert <bschubert@ddn.com>

hbirth force-pushed the redfs-ubuntu-hwe-6.17.0-16.16_24.04.1 branch from ee96a90 to ca3d1bf Compare February 19, 2026 19:33

bsbernd reviewed Feb 19, 2026

View reviewed changes

Conversation

hbirth commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hbirth commented Feb 18, 2026

Uh oh!

bsbernd commented Feb 18, 2026

Uh oh!

hbirth commented Feb 18, 2026

Uh oh!

bsbernd commented Feb 18, 2026

Uh oh!

bsbernd commented Feb 18, 2026

Uh oh!

hbirth commented Feb 18, 2026

Uh oh!

bsbernd commented Feb 18, 2026

Uh oh!

hbirth commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hbirth commented Feb 19, 2026

Uh oh!

hbirth commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bsbernd commented Feb 19, 2026

Uh oh!

bsbernd Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bsbernd Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

bsbernd Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

bsbernd Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

bsbernd Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

Comments

hbirth commented Feb 18, 2026 •

edited

Loading

hbirth commented Feb 19, 2026 •

edited

Loading

hbirth commented Feb 19, 2026 •

edited

Loading