-
-
Notifications
You must be signed in to change notification settings - Fork 418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix segfault caused by unsafe garbage collection optimization #4256
Conversation
@@ -10,6 +10,11 @@ | |||
|
|||
DEFINE_STACK(ponyint_gcstack, gcstack_t, void); | |||
|
|||
static bool might_reference_actor(pony_type_t* t) | |||
{ | |||
return t != NULL && t->might_reference_actor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sometimes type is NULL because we are looking at an opaque object (for example pony_trace
sends NULL for type).
Sending everything through this rather than playing whack-a-mole with finding anything where type might be NULL seems like a much better idea.
We discussed this in great detail during our sync call. We decided to merge this without any optimizations in place yet. |
I will work on getting this mergeable by next week. |
5895042
to
0b180b1
Compare
So that we are less likely to time out on lower powered systems.
Prior to this commit, there was an unsafe optimization in the Pony runtime. The
optimization is detailed in the ORCA paper on the garbage collection protocol
and is usually safe, but sadly not always.
The optimization cuts down on the amount of tracing that is done when an object
is sent from one actor to another. It is based on the observation that for the
sake of reference counting, we don't need to count every object in a graph that
is sent from actor A to actor B so long as the root of the graph being sent is
immutable. This optimization provides a large performance boost over tracing
all objects sent from one actor to another. It also will from time to time,
introduce a segfault that takes down the runtime.
Issue #1118 is the most
obvious instance of the bug caused by the optimization. The core of the problem
is that when an actor's reference count hits 0, it should be able to be reaped.
However, if a reference to an actor is sent to another actor inside an
immutable object, the actor will not be traced on send and might get reaped
while references to it exist. Once that happens, a segfault is guaranteed.
We have fixed the safety problem by tracing every object sent between actors.
In not very rigorous testing using a modified version of
message-ubench,
we saw a 1/3 drop in performance compared to running with the
safety problem/optimization enabled. It should be noted that the 1/3 drop in
performance is probably the high-end in terms of performance hit and many Pony
programs will see little to no performance change.
Additionally, it was observed that overall memory usage for a given program increased when this change was applied. It doesn't look like there's a memory leak, rather, that there is some emergent behavior from the change that results
in more memory being allocated overall. I don't believe it is from a bug but
would not swear that is the case.
Our plan moving forward is to start adding smarts to the compiler in an
incremental fashion so that we can turn the "problematic optimization" back on
when it isn't problematic. We will never recover "all lost performance" because
there are times when the optimization is unsafe and no amount of compiler
smarts will allow us to turn the optimization on in those situations. We can
however, over time, turn the optimization back on for more and more types.
In this commit, we have the early groundwork for change where we add a new
field to all pony type descriptors that holds a boolean for whether a given
type might contain a reference to an actor. If it might, we have to trace. If
the compiler can prove that it doesn't, then sending an immutable version of
the class inter-actor won't require tracing.