Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segfault caused by unsafe garbage collection optimization #4256

Merged
merged 7 commits into from
Dec 13, 2022

Conversation

SeanTAllen
Copy link
Member

@SeanTAllen SeanTAllen commented Nov 23, 2022

Prior to this commit, there was an unsafe optimization in the Pony runtime. The
optimization is detailed in the ORCA paper on the garbage collection protocol
and is usually safe, but sadly not always.

The optimization cuts down on the amount of tracing that is done when an object
is sent from one actor to another. It is based on the observation that for the
sake of reference counting, we don't need to count every object in a graph that
is sent from actor A to actor B so long as the root of the graph being sent is
immutable. This optimization provides a large performance boost over tracing
all objects sent from one actor to another. It also will from time to time,
introduce a segfault that takes down the runtime.

Issue #1118 is the most
obvious instance of the bug caused by the optimization. The core of the problem
is that when an actor's reference count hits 0, it should be able to be reaped.
However, if a reference to an actor is sent to another actor inside an
immutable object, the actor will not be traced on send and might get reaped
while references to it exist. Once that happens, a segfault is guaranteed.

We have fixed the safety problem by tracing every object sent between actors.
In not very rigorous testing using a modified version of
message-ubench,
we saw a 1/3 drop in performance compared to running with the
safety problem/optimization enabled. It should be noted that the 1/3 drop in
performance is probably the high-end in terms of performance hit and many Pony
programs will see little to no performance change.

Additionally, it was observed that overall memory usage for a given program increased when this change was applied. It doesn't look like there's a memory leak, rather, that there is some emergent behavior from the change that results
in more memory being allocated overall. I don't believe it is from a bug but
would not swear that is the case.

Our plan moving forward is to start adding smarts to the compiler in an
incremental fashion so that we can turn the "problematic optimization" back on
when it isn't problematic. We will never recover "all lost performance" because
there are times when the optimization is unsafe and no amount of compiler
smarts will allow us to turn the optimization on in those situations. We can
however, over time, turn the optimization back on for more and more types.

In this commit, we have the early groundwork for change where we add a new
field to all pony type descriptors that holds a boolean for whether a given
type might contain a reference to an actor. If it might, we have to trace. If
the compiler can prove that it doesn't, then sending an immutable version of
the class inter-actor won't require tracing.

@SeanTAllen SeanTAllen requested a review from jemc November 23, 2022 15:46
@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Nov 23, 2022
@SeanTAllen SeanTAllen changed the title Add additional field to pony_type_t 1118 fix Nov 23, 2022
@@ -10,6 +10,11 @@

DEFINE_STACK(ponyint_gcstack, gcstack_t, void);

static bool might_reference_actor(pony_type_t* t)
{
return t != NULL && t->might_reference_actor;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sometimes type is NULL because we are looking at an opaque object (for example pony_trace sends NULL for type).

Sending everything through this rather than playing whack-a-mole with finding anything where type might be NULL seems like a much better idea.

@SeanTAllen
Copy link
Member Author

We discussed this in great detail during our sync call. We decided to merge this without any optimizations in place yet.

@SeanTAllen
Copy link
Member Author

I will work on getting this mergeable by next week.

@SeanTAllen SeanTAllen removed the discuss during sync Should be discussed during an upcoming sync label Nov 29, 2022
@SeanTAllen SeanTAllen changed the title 1118 fix Fix segfault caused by unsafe garbage collection optimization Dec 7, 2022
@SeanTAllen SeanTAllen requested a review from a team December 7, 2022 21:47
@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Dec 7, 2022
@SeanTAllen SeanTAllen marked this pull request as ready for review December 7, 2022 22:15
@SeanTAllen SeanTAllen added do not merge This PR should not be merged at this time changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge labels Dec 7, 2022
@SeanTAllen SeanTAllen removed do not merge This PR should not be merged at this time discuss during sync Should be discussed during an upcoming sync labels Dec 13, 2022
@SeanTAllen SeanTAllen merged commit 2ed3169 into main Dec 13, 2022
@SeanTAllen SeanTAllen deleted the issue-1118-2 branch December 13, 2022 21:51
github-actions bot pushed a commit that referenced this pull request Dec 13, 2022
github-actions bot pushed a commit that referenced this pull request Dec 13, 2022
SeanTAllen added a commit that referenced this pull request Jan 4, 2023
When I implemented turning off the unsafe "don't trace immutable objects"
optimization in PR #4256, I incorrectly caused opaque objects to be
traced in a couple of scenarios.

Tracing opaque objects will result in a segfault like the one discovered
by Gordon.

Fixes #4284
@SeanTAllen SeanTAllen mentioned this pull request Jan 4, 2023
SeanTAllen added a commit that referenced this pull request Jan 4, 2023
When I implemented turning off the unsafe "don't trace immutable objects"
optimization in PR #4256, I incorrectly caused opaque objects to be
traced in a couple of scenarios.

Tracing opaque objects will result in a segfault like the one discovered
by Gordon.

Fixes #4284
SeanTAllen added a commit that referenced this pull request Jan 4, 2023
When I implemented turning off the unsafe "don't trace immutable objects"
optimization in PR #4256, I incorrectly caused opaque objects to be
traced in a couple of scenarios.

Tracing opaque objects will result in a segfault like the one discovered
by Gordon.

Fixes #4284
SeanTAllen added a commit that referenced this pull request Jan 4, 2023
When I implemented turning off the unsafe "don't trace immutable objects"
optimization in PR #4256, I incorrectly caused opaque objects to be
traced in a couple of scenarios.

Tracing opaque objects will result in a segfault like the one discovered
by Gordon.

Fixes #4284
SeanTAllen added a commit that referenced this pull request Jan 4, 2023
When I implemented turning off the unsafe "don't trace immutable objects"
optimization in PR #4256, I incorrectly caused opaque objects to be
traced in a couple of scenarios.

Tracing opaque objects will result in a segfault like the one discovered
by Gordon.

Fixes #4284
SeanTAllen added a commit that referenced this pull request Jan 4, 2023
When I implemented turning off the unsafe "don't trace immutable objects"
optimization in PR #4256, I incorrectly caused opaque objects to be
traced in a couple of scenarios.

Tracing opaque objects will result in a segfault like the one discovered
by Gordon.

Fixes #4284
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants