-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve buffer management throughout the load/fetch and parse lifecycle #2186
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Removed the use of a backing BufferedInputReader, which was redundant and creating large char array buffers. Setting up so that we can recycle the charBuf.
A SoftPool is a ThreadLocal SofReference Stack of <T>, with borrow and release methods. Allows simple recycling of objects between uses, and for those to be reaped when inactive.
Replaces uses of BufferedInputStreams Advantages: - can recycle the byte[] buffer; so significant reduction in GC load - if consumer is reading into an array and there is no mark, no need to allocate a buffer - doesn't aim to support multi-threaded reads, so no syncs or locking Also, reduced the DefaultBufferSize to 8K from 32K
Also, eliminates a redundant buffer by not creating a ByteArrayOutputStream. Because the ByteBuffer returned may have an array with capacity beyond its limit, uses of that bytebuffer now use the .limit() size, vs assuming it was the exact size.
While Buffer and ByteBuffer were available since Android 1, incl .flip(), scents was complaining about ControllableInputStream's use of .flip. Best I can see is that because .flip in previous JDKs returned Buffer, and now ByteBuffer, this caused Android Scents to misread the API and flag. We don't use the return of flip so it's not relevant either way.
When reading ahead to detect the charset, instead of buffering into a new ByteBuffer, reuse the ControllableInputStream.
src/main/java/org/jsoup/internal/ControllableInputStream.java
Dismissed
Show dismissed
Hide dismissed
Also close underlying reader if passthrough read is done.
(Failing in CI on Windows JDK 21 only)
The profiling logs are here: https://gist.github.com/jhy/f6e9c8ca8fd506c6c36f7d1a3154631c |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The goal of BUFFMAN is to reduce memory consumption and reduce GC pressure within jsoup, through improved buffer management.
It picks up the thread from #1800 (by @chibenwa), which I got stalled on when reimplementing buffering in CharacterReader.
Specifically, the implementation will re-use
char[]
andbyte[]
arrays whenever possible, vs creating new ones on each read, which is the Java default pattern. And those buffers are able to be reaped by GC as theSoftPool
implementation holds them in SoftReferences. That improves on the current implementation of this pattern inStringUtil.borrowBuilder()
, which retains those for the lifecycle of the thread, which may be longer than the parser is required. The buffer sizes will also be lowered from the current defaults of 32K.This initial draft includes recycling in
CharacterReader
and forStringBuilders
. I intend to recycle the byte[] buffers used inDataUtil
next (which is used when parsing from a connection or file). I was planning on just extending BufferedInputStream to replace the new byte[] with a recycled, but then remembered #2054 which effectively means we can't extend BIS after Java 21. So I think I'll just make a small non-synchronized impl of a BIS, similar to my impl in CharacterReader.