Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Linux epoll event resubscribe performance and race condition. #1564

Merged
merged 1 commit into from
Feb 8, 2017

Conversation

dipinhora
Copy link
Contributor

Prior to this change, if ONE_SHOT is enabled, linux epoll based
asio wouldn't be subscribed for certain events after another
event was fired on the same FD.

The specific scenario was the following:

  • Register event E for READ/WRITE with ONE_SHOT.
  • E fires for READ and a message is placed on the message queue
    for the TCPConnection actor.
  • TCPConnection actor eventually gets to processing this message
    (after processing any other messages ahread of this one on the
    queue) and then goes through the work of _event_notify. At
    the end of _event_notify, _resubscribe_event is called and
    epoll is told to resubscribe for READ and/or WRITE events as
    required.

Normally, this wouldn't be an issue but the behavior we're seen
is that with ONE_SHOT, if a READ (or WRITE) event is fired, all
events for the FD are disabled. This means that in the above
scenario, when the READ event is fired, the WRITE is also
disabled along with the READ because of ONE_SHOT.

We could (and did) try adding a call to _resubscribe_event in
_event_notify before any processing is done to resubscribe
earlier but there were two problems with this. First, it still
meant that the other event was unsubscribed for the duration of
the time the asio event notification message was sitting on the
message queue of the TCPConnection actor. Second, it resulted in
a race condition and incorrect use of epoll where the TCPConnection
actor would ask epoll to resubscribe to either READ or WRITE when
the event had already fired and was waiting on the message queue.
For the READ event case, this is incorrect use of epoll because
the manual for epoll strongly indicates that when using ONE_SHOT
and Edge Triggering all data should be read from the socket before
it is resubcribed again.

The final design we settled upon is to move the readable/writeable
flags into the Event structure, split the resubscribe for READ
and WRITE events, and have epoll automagically resubscribe to the
untriggered event if ONE_SHOT is enabled. This ensures there are
no race conditions because the decision to resubscribe is based on
the the state of the readable/writeable flags on the event which is
always up to date based on the latest events that fired. It also
ensures that we resubscribe to the untriggered event as soon as
possible because the epoll asio thread takes care of it.

  • The event now holds the readable/writeable flags.
  • This is only safe because the asio thread will only
    write to those flags if there's a subscription for an event.
  • In order to avoid two threads writing/reading to the new flags
    at the same time, the resubscribe for reads and writes has been
    split.
  • Additionally, the epoll code will now automagically resubscribe
    to the un-triggered event in order to minimize the delay for when
    an event is triggered for ONE_SHOT code so it behaves the same
    as for kqueue ONE_SHOT where if an FD is subscribed with ONE_SHOT
    for both read and write events and a read event fires and
    is disabled because of ONE_SHOT, the write event is still left
    enabled. This also greatly simplifies where/when resubscribes
    need to occur in the TCPConnection actor.
  • Update kqueue code to also split readable/writeable resubscribe.

@SeanTAllen
Copy link
Member

Side note. We've been running with this for 4-6 weeks now a Sendence (I think its been that long), without issue.

@pony_asio_event_get_writeable(ev)

fun set_writeable(ev: AsioEventID, writeable': Bool) =>
@pony_asio_event_set_writeable(ev, writeable')
Copy link
Member

@jemc jemc Feb 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should have these wrapper functions - the net package actors should be calling the FFI functions directly.

My objection is that these functions provide ways to read/write to a AsioEventID tag, which does not respect capabilities and memory safety. That is, just because you've used them safely in this PR (and helpfully commenting above each usage how/why it is safe 🏅), doesn't mean that users will be as careful. And they shouldn't have to be careful - part of the point of Pony is being able to write code that respects the compiler rules and assume that it is concurrency safe without thinking about the issue any deeper than that.

If we make the unsafe operations be FFI-only (by removing these wrappers), than a programmer can only do an unsafe operation by doing an FFI call, which is exactly what we want. The fact that it's an FFI call should set off alarm bells for the programmer, and make them think about carefully proving to themselves the safety of what they're doing, just as you've done here (since they now know the compiler won't be proving the safety for them).

Removing the wrapper functions and simply calling the FFI functions directly from TCPConnection et al is a simple way to avoid this concern while not fundamentally changing the nature of your solution. These actors are already using FFI extensively, so it shouldn't be a burden.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'll update and force push to make the changes you've suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes made and pushed.

#endif

bool readable; /* is fd readable? */
bool writeable; /* is fd writeable? */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just asking for prudence's sake - does adding these struct fields put us into a higher size_t category that will make allocating events significantly more expensive? I only ask because we recently went through something similar with adding annotations to ast_t (see ongoing discussion in #1531).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe so. The size of asio_event_t is already over 32 bytes (but well below 64 bytes) before the new additions (assuming I did the math correctly).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking!

@jemc
Copy link
Member

jemc commented Feb 7, 2017

Also, just wanted to say thank you for the very thorough and detailed ticket description / commit message! ❤️

Prior to this change, if ONE_SHOT is enabled, linux epoll based
asio wouldn't be subscribed for certain events after another
event was fired on the same FD.

The specific scenario was the following:

* Register event E for READ/WRITE with ONE_SHOT.
* E fires for READ and a message is placed on the message queue
  for the TCPConnection actor.
* TCPConnection actor eventually gets to processing this message
  (after processing any other messages ahread of this one on the
  queue) and then goes through the work of `_event_notify`. At
  the end of `_event_notify`, `_resubscribe_event` is called and
  epoll is told to resubscribe for READ and/or WRITE events as
  required.

Normally, this wouldn't be an issue but the behavior we're seen
is that with ONE_SHOT, if a READ (or WRITE) event is fired, all
events for the FD are disabled. This means that in the above
scenario, when the READ event is fired, the WRITE is also
disabled along with the READ because of ONE_SHOT.

We could (and did) try adding a call to `_resubscribe_event` in
`_event_notify` before any processing is done to resubscribe
earlier but there were two problems with this. First, it still
meant that the other event was unsubscribed for the duration of
the time the asio event notification message was sitting on the
message queue of the TCPConnection actor. Second, it resulted in
a race condition and incorrect use of epoll where the TCPConnection
actor would ask epoll to resubscribe to either READ or WRITE when
the event had already fired and was waiting on the message queue.
For the READ event case, this is incorrect use of epoll because
the manual for epoll strongly indicates that when using ONE_SHOT
and Edge Triggering all data should be read from the socket before
it is resubcribed again.

The final design we settled upon is to move the readable/writeable
flags into the Event structure, split the resubscribe for READ
and WRITE events, and have epoll automagically resubscribe to the
untriggered event if ONE_SHOT is enabled. This ensures there are
no race conditions because the decision to resubscribe is based on
the the state of the readable/writeable flags on the event which is
always up to date based on the latest events that fired. It also
ensures that we resubscribe to the untriggered event as soon as
possible because the epoll asio thread takes care of it.

* The event now holds the readable/writeable flags.
* This is only safe because the asio thread will only
  write to those flags if there's a subscription for an event.
* In order to avoid two threads writing/reading to the new flags
  at the same time, the resubscribe for reads and writes has been
  split.
* Additionally, the epoll code will now automagically resubscribe
  to the un-triggered event in order to minimize the delay for when
  an event is triggered for ONE_SHOT code so it behaves the same
  as for kqueue ONE_SHOT where if an FD is subscribed with ONE_SHOT
  for both `read` and `write` events and a `read` event fires and
  is disabled because of ONE_SHOT, the `write` event is still left
  enabled. This also greatly simplifies where/when resubscribes
  need to occur in the TCPConnection actor.
* Update kqueue code to also split readable/writeable resubscribe.
Copy link
Member

@jemc jemc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but I won't merge just yet since the PR hasn't been open very long, and someone else might want to take a look before merging?

@dipinhora
Copy link
Contributor Author

Makes sense. Thanks for your feedback on things and the suggested improvement.

@sylvanc
Copy link
Contributor

sylvanc commented Feb 8, 2017

That description was so good, by the time I read the code, I knew exactly what it did and what I was looking for. Looks good to me, thanks @dipinhora !

@sylvanc sylvanc self-requested a review February 8, 2017 10:51
@jemc jemc changed the title Linux epoll event resubscribe optimization Fix Linux epoll event resubscribe performance and race condition. Feb 8, 2017
@jemc jemc added the changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge label Feb 8, 2017
@jemc jemc merged commit e6a69e7 into ponylang:master Feb 8, 2017
ponylang-main added a commit that referenced this pull request Feb 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog - fixed Automatically add "Fixed" CHANGELOG entry on merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants