Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix use-after-free triggered by fast actor reaping when the cycle detector is active #4616

Merged
merged 2 commits into from
Feb 18, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .release-notes/4616.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
## Fix use-after-free triggered by fast actor reaping when the cycle detector is active

The logic in the actor run and the cycle detector work together to enable fast reaping of actors with rc == 0. This relies on atomics and protecting the relevant areas of logic with critical sections. This logic unfortunately suffered from a use-after-free bug due to a race between the cycle detector receiving the block message and destroying the actor and the actor cycle detector critical flag being release as identified in #4614 which could sometimes lead to memory corruption.

This commit changes things to remove the need to protect the logic with critical sections. It achieves this by ensuring that an actor with rc == 0 that the cycle detector knows about will never be rescheduled again even if the cycle detector happens to send it a message and the cycle detector is free to reap the actor when it receives the block message. The cycle detector ensures that the actor's message queue is empty or that the only messages pending are the expected ones from the cycle detector so it can safely destroy the actor.
93 changes: 24 additions & 69 deletions src/libponyrt/actor/actor.c
Original file line number Diff line number Diff line change
Expand Up @@ -707,71 +707,37 @@ bool ponyint_actor_run(pony_ctx_t* ctx, pony_actor_t* actor, bool polling)
return !empty;
}
} else {
// The cycle detector is running so we have to ensure that it doesn't
// send a message to the actor while we're in the process of telling
// it that it is safe to destroy the actor.
// Tell cycle detector that this actor is a zombie and will not get
// any more messages/work and can be reaped.
// Mark the actor as FLAG_BLOCKED_SENT and send a BLOCKED message
// to speed up reaping otherwise waiting for the cycle detector
// to get around to asking if we're blocked could result in
// unnecessary memory growth.
//
// Before we check if our queue is empty, we need to obtain the
// "critical delete" atomic for this actor. The cycle detector will
// bail from sending any messages if it can't obtain the atomic.
// Similarly, if the actor can't obtain the atomic here, then we do not
// attempt any "I can be destroyed" operations as the cycle detector is
// in the process of sending us a message.
if (ponyint_acquire_cycle_detector_critical(actor))
{
if(ponyint_messageq_isempty(&actor->q))
{
// At this point the actors queue is empty and the cycle detector
// will not send it any more messages because we "own" the barrier
// for sending cycle detector messages to this actor.
ponyint_actor_setpendingdestroy(actor);

// Tell cycle detector that this actor is a zombie and will not get
// any more messages/work and can be reaped.
// Mark the actor as FLAG_BLOCKED_SENT and send a BLOCKED message
// to speed up reaping otherwise waiting for the cycle detector
// to get around to asking if we're blocked could result in
// unnecessary memory growth.
//
// We're blocked, send block message telling the cycle detector
// to reap this actor (because its `rc == 0`).
// This is concurrency safe because, only the cycle detector might
// have a reference to this actor (rc is 0) so another actor can not
// send it an application message that results this actor becoming
// unblocked (which would create a race condition) and we've also
// ensured that the cycle detector will not send this actor any more
// messages (which would also create a race condition).
send_block(ctx, actor);

// mark the queue as empty or else destroy will hang
bool empty = ponyint_messageq_markempty(&actor->q);

// make sure the queue is actually empty as expected
pony_assert(empty);

// "give up" critical section ownership
ponyint_release_cycle_detector_critical(actor);

// make sure the scheduler will not reschedule this actor
return !empty;
} else {
// "give up" critical section ownership
ponyint_release_cycle_detector_critical(actor);
}
}
// We're blocked, send block message telling the cycle detector
// to reap this actor (because its `rc == 0`).
// This is concurrency safe because, only the cycle detector might
// have a reference to this actor (rc is 0) so another actor can not
// send it an application message that results this actor becoming
// unblocked (which would create a race condition).
send_block(ctx, actor);

// We have to ensure that this actor will not be rescheduled if the
// cycle detector happens to send it a message.
//
// intentionally don't mark the queue as empty because we don't want
// this actor to be rescheduled if the cycle detector sends it a message
//
// make sure the scheduler will not reschedule this actor
return false;
}
} else {
// gc is greater than 0
if (!actor_noblock && !has_internal_flag(actor, FLAG_CD_CONTACTED))
{
// The cycle detector is running and we've never contacted it ourselves,
// so let's it know we exist in case it is unaware.
if (ponyint_acquire_cycle_detector_critical(actor))
{
send_block(ctx, actor);
// "give up" critical section ownership
ponyint_release_cycle_detector_critical(actor);
}
// so let it know we exist in case it is unaware.
send_block(ctx, actor);
}
}
}
Expand Down Expand Up @@ -1323,14 +1289,3 @@ size_t ponyint_actor_total_alloc_size(pony_actor_t* actor)
+ POOL_ALLOC_SIZE(pony_msg_t);
}
#endif

bool ponyint_acquire_cycle_detector_critical(pony_actor_t* actor)
{
uint8_t expected = 0;
return atomic_compare_exchange_strong_explicit(&actor->cycle_detector_critical, &expected, 1, memory_order_acq_rel, memory_order_acquire);
}

void ponyint_release_cycle_detector_critical(pony_actor_t* actor)
{
atomic_store_explicit(&actor->cycle_detector_critical, 0, memory_order_release);
}
2 changes: 0 additions & 2 deletions src/libponyrt/actor/actor.h
Original file line number Diff line number Diff line change
Expand Up @@ -56,8 +56,6 @@ typedef struct pony_actor_t
messageq_t q;
// sync flags are access from multiple scheduler threads concurrently
PONY_ATOMIC(uint8_t) sync_flags;
// accessed from the cycle detector and the actor
PONY_ATOMIC(uint8_t) cycle_detector_critical;

// keep things accessed by other actors on a separate cache line
alignas(64) heap_t heap; // 52/104 bytes
Expand Down
59 changes: 23 additions & 36 deletions src/libponyrt/gc/cycle.c
Original file line number Diff line number Diff line change
Expand Up @@ -640,23 +640,7 @@ static void send_conf(pony_ctx_t* ctx, perceived_t* per)

while((view = ponyint_viewmap_next(&per->map, &i)) != NULL)
{
// The actor itself is also allowed to initiate a deletion of itself if
// it is blocked and has a reference count of 0. Because deletion is
// not an atomic operation, and the message queue remaining empty is an
// invariant that can't be violated if the cycle detector is sending
// a message to an actor, that actor isn't eligible to initiate getting
// reaped in a short-circuit fashion. Only the actor or the cycle
// detector can be in the "cycle detector critical" section for the actor.
if (ponyint_acquire_cycle_detector_critical(view->actor))
{
// To avoid a race condition, we have to make sure that the actor isn't
// pending destroy before sending a message. Failure to do so could
// eventually result in a double free.
if(!ponyint_actor_pendingdestroy(view->actor))
pony_sendi(ctx, view->actor, ACTORMSG_CONF, per->token);

ponyint_release_cycle_detector_critical(view->actor);
}
pony_sendi(ctx, view->actor, ACTORMSG_CONF, per->token);
}
}

Expand Down Expand Up @@ -808,23 +792,7 @@ static void check_blocked(pony_ctx_t* ctx, detector_t* d)
// if it is not already blocked
if(!view->blocked)
{
// The actor itself is also allowed to initiate a deletion of itself if
// it is blocked and has a reference count of 0. Because deletion is
// not an atomic operation, and the message queue remaining empty is an
// invariant that can't be violated if the cycle detector is sending
// a message to an actor, that actor isn't eligible to initiate getting
// reaped in a short-circuit fashion. Only the actor or the cycle
// detector can be in the "cycle detector critical" section for the actor.
if (ponyint_acquire_cycle_detector_critical(view->actor))
{
// To avoid a race condition, we have to make sure that the actor isn't
// pending destroy before sending a message. Failure to do so could
// eventually result in a double free.
if(!ponyint_actor_pendingdestroy(view->actor))
pony_send(ctx, view->actor, ACTORMSG_ISBLOCKED);

ponyint_release_cycle_detector_critical(view->actor);
}
pony_send(ctx, view->actor, ACTORMSG_ISBLOCKED);
}

// Stop if we've hit the max limit for # of actors to check
Expand Down Expand Up @@ -902,10 +870,29 @@ static void block(detector_t* d, pony_ctx_t* ctx, pony_actor_t* actor,
ponyint_deltamap_free(map);
}

// the actor should already be marked as pending destroy
pony_assert(ponyint_actor_pendingdestroy(actor));
// the actor should not already be marked as pending destroy
pony_assert(!ponyint_actor_pendingdestroy(actor));

pony_msg_t* msg;

// the actor's queue wasn't marked empty because we didn't want the actor
// to be rescheduled if the cycle detector sent it a message
// make sure the actor's queue is empty and the only pending messages are
// the expected ones from the cycle detector
while(!ponyint_messageq_markempty(&actor->q))
{
while((msg = ponyint_actor_messageq_pop(&actor->q
#ifdef USE_DYNAMIC_TRACE
, ctx->scheduler, actor
#endif
)) != NULL)
{
pony_assert((msg->id == ACTORMSG_CONF) || (msg->id == ACTORMSG_ISBLOCKED));
}
}

// invoke the actor's finalizer and destroy it
ponyint_actor_setpendingdestroy(actor);
ponyint_actor_final(ctx, actor);
ponyint_actor_sendrelease(ctx, actor);
ponyint_actor_destroy(actor);
Expand Down