Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix use-after-free triggered by fast actor reaping when the cycle detector is active #4616

Merged
merged 2 commits into from
Feb 18, 2025

Conversation

dipinhora
Copy link
Contributor

The logic in the actor run and the cycle detector work together to enable fast reaping of actors with rc == 0. This relies on atomics and protecting the relevant areas of logic with critical sections. This logic unfortunately suffered from a use-after-free bug due to a race between the cycle detector receiving the block message and destroying the actor and the actor cycle detector critical flag being release as identified in #4614 which could sometimes lead to memory corruption.

This commit changes things to remove the need to protect the logic with critical sections. It achieves this by ensuring that an actor with rc == 0 that the cycle detector knows about will never be rescheduled again even if the cycle detector happens to send it a message and the cycle detector is free to reap the actor when it receives the block message. The cycle detector ensures that the actor's message queue is empty or that the only messages pending are the expected ones from the cycle detector so it can safely destroy the actor.

resolves #4614

The logic in the actor run and the cycle detector work together to
enable fast reaping of actors with rc == 0. This relies on atomics
and protecting the relevant areas of logic with critical sections.
This logic unfortunately suffered from a use-after-free bug due to
a race between the cycle detector receiving the block message and
destroying the actor and the actor cycle detector critical flag
being release as identified in ponylang#4614 which could sometimes lead
to memory corruption.

This commit changes things to remove the need to protect the
logic with critical sections. It achieves this by ensuring that
an actor with rc == 0 that the cycle detector knows about will
never be rescheduled again even if the cycle detector happens to
send it a message and the cycle detector is free to reap the actor
when it receives the block message. The cycle detector ensures that
the actor's message queue is empty or that the only messages
pending are the expected ones from the cycle detector so it can
safely destroy the actor.

resolves ponylang#4614
@ponylang-main ponylang-main added the discuss during sync Should be discussed during an upcoming sync label Feb 17, 2025
@dipinhora
Copy link
Contributor Author

@redvers it would be ideal if you're able to confirm that this PR resolves your segfault from #4614..

@SeanTAllen it would be ideal if you're able to review the logic changes to confirm i didn't screw anything up as i believe you're the last person to touch this code and likely know the nuances/edge cases that need to be handled best..

@SeanTAllen SeanTAllen added the changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge label Feb 17, 2025
@ponylang-main
Copy link
Contributor

Hi @dipinhora,

The changelog - fixed label was added to this pull request; all PRs with a changelog label need to have release notes included as part of the PR. If you haven't added release notes already, please do.

Release notes are added by creating a uniquely named file in the .release-notes directory. We suggest you call the file 4616.md to match the number of this pull request.

The basic format of the release notes (using markdown) should be:

## Title

End user description of changes, why it's important,
problems it solves etc.

If a breaking change, make sure to include 1 or more
examples what code would look like prior to this change
and how to update it to work after this change.

Thanks.

@SeanTAllen
Copy link
Member

@redvers it would be ideal if you're able to confirm that this PR resolves your segfault from #4614..

@SeanTAllen it would be ideal if you're able to review the logic changes to confirm i didn't screw anything up as i believe you're the last person to touch this code and likely know the nuances/edge cases that need to be handled best..

@dipinhora I will review and I appreciate your believing that I am still familiar with this code that tortured me a few years ago. I especially appreciate at it, after I believe you found a bug in code I wrote (at least I remember it as "I wrote").

@SeanTAllen
Copy link
Member

This hurts my brain. I think it is good. But I'd love to have @redvers throw lots of requests at it.

Writing the code to be removed also hurt my brain. I remember Dipin and I spending A TON OF TIME on it. A metric shit ton of time.

@SeanTAllen SeanTAllen changed the title Make the cycle detector uncritical Fix use after free triggered by fast actor reaping when the cycle detector is active Feb 17, 2025
@SeanTAllen SeanTAllen changed the title Fix use after free triggered by fast actor reaping when the cycle detector is active Fix use-after-free triggered by fast actor reaping when the cycle detector is active Feb 17, 2025
@dipinhora
Copy link
Contributor Author

This hurts my brain. I think it is good. But I'd love to have @redvers throw lots of requests at it.

Writing the code to be removed also hurt my brain. I remember Dipin and I spending A TON OF TIME on it. A metric shit ton of time.

i guess it all rests on @redvers's shoulders.. 8*P

@redvers
Copy link
Contributor

redvers commented Feb 17, 2025

I definitely have the easiest job here. ;)

@redvers
Copy link
Contributor

redvers commented Feb 17, 2025

Yup, 15.5 Millions requests later, no SEGV.

@SeanTAllen SeanTAllen merged commit 85087bb into ponylang:main Feb 18, 2025
22 checks passed
@ponylang-main ponylang-main removed the discuss during sync Should be discussed during an upcoming sync label Feb 18, 2025
github-actions bot pushed a commit that referenced this pull request Feb 18, 2025
github-actions bot pushed a commit that referenced this pull request Feb 18, 2025
@SeanTAllen SeanTAllen mentioned this pull request Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SEGV in cycle-generator under high load
4 participants