Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using VecDeque for pending events in various protocols causes task to not wake when a new event is added #5147

Closed
ShahakShama opened this issue Feb 5, 2024 · 11 comments

Comments

@ShahakShama
Copy link

Summary

In various protocols (for example, identify and request-response), we store the pending events that the behaviour should return in poll inside a VecDeque. When the queue is empty, we return Poll::Pending without doing anything with the given context in order to ensure the task is awaken when someone pushes an event to the queue.

Expected behavior

When we call a behaviour's poll, and there are no more pending events, the calling task should be awaken when someone adds an event to the queue (example for adding an event to the queue from identify)

Actual behavior

The calling task isn't awaken in such case, and this can potentially lead to a deadlock if no one else wakes the task (I don't know the inner implementation of Swarm to know if maybe something else always keeps the task awake, but anyway this should be fixed in case no one in the swarm awakes the task)

Relevant log output

No response

Possible Solution

Instead of using VecDeque, use a queue that has an async blocking pop function that awaits until there is an item in the queue, and the pop function will register the current task to be awaken when a new item is added. For example, deadqueue

Version

0.53.2

Would you like to work on fixing this bug ?

Yes

@drHuangMHT
Copy link
Contributor

Didn't see any problem with current implementation. My custom behaviour works just fine this way. If you call async function you still need to return Poll::Pending if the future can't be resolved soon. Any blocking code inside poll() will block the entire behaviour, preventing other behaviours from making progress.

@ShahakShama
Copy link
Author

The problem is that when you return Poll::Pending without doing anything with the context variable you are given, the current thread will go to sleep and will not awake unless a different future awakes it. You should keep the waker of the context somewhere and awake it when a new event is added to the queue

@drHuangMHT
Copy link
Contributor

the current thread will go to sleep and will not awake unless a different future awakes it.

That makes sense. There could be many futures to be awaited so the problem doesn't surface.

@ShahakShama
Copy link
Author

But still, this issue should be fixed in case it will surface in the future, right?

@thomaseizinger
Copy link
Contributor

libp2p::Swarm guarantees that the behaviour is polled again after any changes to it via the internal APIs (i.e. all functions of the NetworkBehaviour trait). This has been done to avoid excessive use of Wakers.

You only need to implement a waking mechanism if you modify the state of a behaviour via additional APIs.

@thomaseizinger
Copy link
Contributor

So technically, send_request would need a waker but it hasn't been a problem in practice as there is usually so much going on in a Swarm that this will be picked up very timely.

Open to PRs that add a waker though if it fixes a bug for you.

@dariusc93
Copy link
Member

If youre polling the behaviours directly and there isnt anything in there trigger it to be polled again by the executor, I can see this happening, however I dont believe that is really the case here since i dont believe one should be polling a behaviour outside of Swarm (eg polling it directly) as Swarm does alot internally that would cause it to poll the behaviours (and connection handlers)

@dariusc93
Copy link
Member

dariusc93 commented Feb 12, 2024

So technically, send_request would need a waker but it hasn't been a problem in practice as there is usually so much going on in a Swarm that this will be picked up very timely.

Open to PRs that add a waker though if it fixes a bug for you.

The same could probably be done for push (if it is a problem)

@ShahakShama
Copy link
Author

I see, thanks
I mainly opened the issue to know what to do in a custom behaviour I'm implementing that has the same issue, so I don't have a specific bug in request-response or identify that I want to open a PR (I didn't encounter any issue, I just saw it and thought to myself that this is a potential bug)
Should I close this issue?

@dariusc93
Copy link
Member

I see, thanks I mainly opened the issue to know what to do in a custom behaviour I'm implementing that has the same issue, so I don't have a specific bug in request-response or identify that I want to open a PR (I didn't encounter any issue, I just saw it and thought to myself that this is a potential bug) Should I close this issue?

Could you provide any code of your behaviour? We are more than welcome to explain from there what the issue would be or if there is a bug in libp2p that we could address it :). However, if you are calling a function to your behaviour that stores events meant for Swarm and there isnt any other activity going on in Swarm, then you might need to add a waker.

@ShahakShama
Copy link
Author

There's no need. I just wanted to know whether I should add a waker or not, and based on what you wrote I think I'll add a waker just to be on the safe side

Regarding whether to add a waker to request-response and identify, I leave that up to you

Thanks for everything 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants