-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threadsafety issue in org.jsoup.nodes.Entities
#2042
Comments
Note: there could be other similar |
However, given that we observe what looks like a deadlock, the problem might not be the There was a related bug in the class involving the initializers, #1910. Our exception looks similar to this bug in a different library, given I think this suggests that b6f652c might have just converted the |
Actually, upon inspection while our dependency has updated to 1.6.1, we haven't updated to the version of the dependency that has 1.6.1. |
For some context - the static hashmap As far as I can see I don't believe this is related to #1910 as that was a straight single-thread path to a null pointer. And I can't see a deadlock path in those stack traces (or in the code). Do you have more thread dumps that shows two threads are waiting on each other? My expectation for this is that if you have many threads running, and then they all suddenly want to load Entities for the first time, there will be a race to lock the static initializer, and one thread will win. It will quickly (like, a few millis) parse the entities data and write to the hashmap, then unlock and the other threads will resume. The times you have in that trace look awful! Is something else causing the Entities to be evicted from permgen? Does this lock happen at startup or after running for some time? How many concurrent threads are there? |
That they are waiting on different initializers is interesting and I wonder if there's a path there?
And particularly that there is any wait for Entities. That doesn't do anything - the only static init is to create the empty hashmap. The EscapeMode init (which does not happen in Entities but in other users of the enum) is what could take any time. Does feel like a lock somehow. Perhaps on something else? |
@owengray-google do you have any further info for this? Without a way to repro I haven't been able to make any progress here. |
Androidx is open-source, but this is a low-probability flake, less than 1%. |
Alright, attached is a large threadump. It actually does seem like The clearly-jsoup-related segments:
|
Thanks for the update. It's interesting. I have been able to craft a kind-of repro by adding a Thread.sleep(2000) as a static initializer in each of Entities and OutputSettings. And then a runner which spawns a few threads which run a Entities.escape and a Document.parseBodyFragment. It will threadlock in a similar manner to those threads above. However, I find that only happens when I rollback the changes for #1910. If I leave them in (the lazy init), even with adding extra sleeps about the place, I can't get it to threadlock like that. I can take a pass on refactoring to further decouple OutputSettings and Entities during initialization. Would you be able to evaluate a snapshot / branch version of jsoup with a tentative fix? |
Can you confirm what version you are running that produced those dumps?
That line must be from a version older than 1.16.2, because I added some lines for namespaces at the top of Parser which pushes it to line 227. Is it possible that you are running a version earlier than 1.16.1? Prior to the OutputSettings lazy load being enabled? |
Ping @owengray-google, are you able to check that? |
Sorry about that. It turns out that our upstream dependency is jarjaring the old version of jsoup, so we're trying to upgrade to a newer version of it, but the version with a newer jsoup also includes an upgrade to the Kotlin K2 compiler that has a lot of breaking changes, so it's taking a bit. |
Hi @owengray-google - just checking in. Is this still occurring with current versions of jsoup for you? |
@owengray-google could you explain the status please. I don't want to drop it if there's still an underlying issue. Great if the updated version means that it no longer manifests. |
Yes, it does seem that 1.16.1 fixed this. Thanks for catching that, and sorry for the trouble. |
That's great, thanks for confirming. And no trouble. |
With b731fd7 I have further decoupled OutputSettings and Entities. |
We have seen some hung processes when using our (Google's) dackka, built on Jetbrains' dokka, using jsoup.
I believe the underlying issue we're running into is in
org.jsoup.nodes.Entities
, where:private static final HashMap<String, String> multipoints = new HashMap<>(); // name -> multiple character references
is
static
but is not threadsafe, and that this issue would be fixed by usingConcurrentHashMap
there.For reference, the relevant thread dump excerpts:
4556 org.jetbrains.dokka.MainKt /buildbot/dist_dirs/aosp-androidx-main-linux-androidx/11066813/dackkaArgs-docs-public.json -loggingLevel WARN -Dfile.encoding=UTF-8 -Duser.country=US -Duser.language=en -Duser.variant
The text was updated successfully, but these errors were encountered: