gh-129205: Experiment BytesIO._readfrom() #130098
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Draft PR / Experiment
Rather than directly moving loops, I have been experimenting with a
BytesIO._readfrom(file, /, *, estimate=None, limit=None)that encapsulates the buffer resizing efficiently as well as the read loop, adding common code for features "estimated size" and "limit size". It can be used to implementFileIO.readallwith minimal perf change (is in PR). In general I think "If there is a read loop, it should be faster and simpler".The
_pyioimplementation supports for three kinds of IO objects: Direct FD ints, those with a.readintomember, and those with.readmember. If that looks like a reasonable approach, I'd likely introduce it as a internal methodBytesIO._readfrom()and move cases (with perf tests to make sure things don't regress).In the C implementation I included an optimization to avoid a heap allocation by using 1KB of stack space in the
estimate=0case instead. Not sure that's worth the complexity and cost (if it gets used, need an extra copy compared to just using a bytes; and a warmed up interpreter feels like 1KB PyBytes is likely to be quickly available / allocated).The CPython codebase has a common pattern of build a list of I/O chunks than "join" them together at the end of the loop. I think readfrom makes a tradeoff in that case, in that as long as resize infrequently copies (I think not lots of other memory buffers being allocated), it should be faster than that single extra large join and copy at the end. I haven't run full performance numbers though. In my mental model using non-linear buffer resizing for large readall is likely a much bigger performance gain and reducing number of allocs + deallocs, than potential
realloccopies; definitely uses less memory overall.