Fix segfault caused by unsafe garbage collection optimization #4256

SeanTAllen · 2022-11-23T15:46:19Z

Prior to this commit, there was an unsafe optimization in the Pony runtime. The
optimization is detailed in the ORCA paper on the garbage collection protocol
and is usually safe, but sadly not always.

The optimization cuts down on the amount of tracing that is done when an object
is sent from one actor to another. It is based on the observation that for the
sake of reference counting, we don't need to count every object in a graph that
is sent from actor A to actor B so long as the root of the graph being sent is
immutable. This optimization provides a large performance boost over tracing
all objects sent from one actor to another. It also will from time to time,
introduce a segfault that takes down the runtime.

Issue #1118 is the most
obvious instance of the bug caused by the optimization. The core of the problem
is that when an actor's reference count hits 0, it should be able to be reaped.
However, if a reference to an actor is sent to another actor inside an
immutable object, the actor will not be traced on send and might get reaped
while references to it exist. Once that happens, a segfault is guaranteed.

We have fixed the safety problem by tracing every object sent between actors.
In not very rigorous testing using a modified version of
message-ubench,
we saw a 1/3 drop in performance compared to running with the
safety problem/optimization enabled. It should be noted that the 1/3 drop in
performance is probably the high-end in terms of performance hit and many Pony
programs will see little to no performance change.

Additionally, it was observed that overall memory usage for a given program increased when this change was applied. It doesn't look like there's a memory leak, rather, that there is some emergent behavior from the change that results
in more memory being allocated overall. I don't believe it is from a bug but
would not swear that is the case.

Our plan moving forward is to start adding smarts to the compiler in an
incremental fashion so that we can turn the "problematic optimization" back on
when it isn't problematic. We will never recover "all lost performance" because
there are times when the optimization is unsafe and no amount of compiler
smarts will allow us to turn the optimization on in those situations. We can
however, over time, turn the optimization back on for more and more types.

In this commit, we have the early groundwork for change where we add a new
field to all pony type descriptors that holds a boolean for whether a given
type might contain a reference to an actor. If it might, we have to trace. If
the compiler can prove that it doesn't, then sending an immutable version of
the class inter-actor won't require tracing.

SeanTAllen · 2022-11-23T20:13:58Z

src/libponyrt/gc/gc.c

@@ -10,6 +10,11 @@

 DEFINE_STACK(ponyint_gcstack, gcstack_t, void);

+static bool might_reference_actor(pony_type_t* t)
+{
+  return t != NULL && t->might_reference_actor;


sometimes type is NULL because we are looking at an opaque object (for example pony_trace sends NULL for type).

Sending everything through this rather than playing whack-a-mole with finding anything where type might be NULL seems like a much better idea.

SeanTAllen · 2022-11-29T19:49:37Z

We discussed this in great detail during our sync call. We decided to merge this without any optimizations in place yet.

SeanTAllen · 2022-11-29T19:49:50Z

I will work on getting this mergeable by next week.

So that we are less likely to time out on lower powered systems.

When I implemented turning off the unsafe "don't trace immutable objects" optimization in PR #4256, I incorrectly caused opaque objects to be traced in a couple of scenarios. Tracing opaque objects will result in a segfault like the one discovered by Gordon. Fixes #4284

SeanTAllen added 4 commits November 22, 2022 09:38

Add additional field to pony_type_t

b2ff4f3

Appears to be working

e212b36

Fix "heap used" push for immutable

74c891d

Add regression test

3ee2302

SeanTAllen requested a review from jemc November 23, 2022 15:46

ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Nov 23, 2022

SeanTAllen changed the title ~~Add additional field to pony_type_t~~ 1118 fix Nov 23, 2022

SeanTAllen commented Nov 23, 2022

View reviewed changes

SeanTAllen removed the discuss during sync Should be discussed during an upcoming sync label Nov 29, 2022

SeanTAllen mentioned this pull request Nov 29, 2022

Turn off "dont trace immutable" optimization #4247

Closed

SeanTAllen changed the title ~~1118 fix~~ Fix segfault caused by unsafe garbage collection optimization Dec 7, 2022

SeanTAllen requested a review from a team December 7, 2022 21:47

ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Dec 7, 2022

Add release notes

0b180b1

SeanTAllen force-pushed the issue-1118-2 branch from 5895042 to 0b180b1 Compare December 7, 2022 22:11

SeanTAllen marked this pull request as ready for review December 7, 2022 22:15

SeanTAllen added do not merge This PR should not be merged at this time changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge labels Dec 7, 2022

SeanTAllen added 2 commits December 7, 2022 22:20

Run regression test with the cycle detector

9ea856c

Lower number of iterations for regression test

693e6c2

So that we are less likely to time out on lower powered systems.

jemc approved these changes Dec 13, 2022

View reviewed changes

SeanTAllen removed do not merge This PR should not be merged at this time discuss during sync Should be discussed during an upcoming sync labels Dec 13, 2022

SeanTAllen merged commit 2ed3169 into main Dec 13, 2022

SeanTAllen deleted the issue-1118-2 branch December 13, 2022 21:51

github-actions bot pushed a commit that referenced this pull request Dec 13, 2022

Updates release notes for PR #4256

f746015

github-actions bot pushed a commit that referenced this pull request Dec 13, 2022

Update CHANGELOG for PR #4256

5f4b755

SeanTAllen mentioned this pull request Dec 13, 2022

Segmentation fault when actor receives a reference to itself via a class created in a different actor. #1118

Closed

SeanTAllen mentioned this pull request Jan 4, 2023

Fix runtime segfault #4294

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix segfault caused by unsafe garbage collection optimization #4256

Fix segfault caused by unsafe garbage collection optimization #4256

SeanTAllen commented Nov 23, 2022 •

edited

Loading

SeanTAllen Nov 23, 2022

SeanTAllen commented Nov 29, 2022

SeanTAllen commented Nov 29, 2022

Fix segfault caused by unsafe garbage collection optimization #4256

Fix segfault caused by unsafe garbage collection optimization #4256

Conversation

SeanTAllen commented Nov 23, 2022 • edited Loading

SeanTAllen Nov 23, 2022

Choose a reason for hiding this comment

SeanTAllen commented Nov 29, 2022

SeanTAllen commented Nov 29, 2022

SeanTAllen commented Nov 23, 2022 •

edited

Loading