-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
webrtc Flag not being transferred #5877
Comments
Thanks @DougAnderson444 for debugging this! |
This is a subtle bug and I really only caught it when I wanted to use the WebRTC Channel right away but noticed this 10 second delay before I could start. It becomes really obvious when we set the The issueThis sequence is the culprit: webrtc-utils noise inbound:
webrtc-utils noise outbound:
Each
...at exactly the same time. see poll_closerust-libp2p/misc/webrtc-utils/src/stream.rs Lines 222 to 250 in 9e0e8be
No wonder Flag::FIN never arrives at the stream, it's closed / being closed! This puts each webrtc noise data handshake channel in an undefined state and the stream remains open until it timeout 10 seconds later. At which point, a regular data channel is opened to replace it and normal operations resume. BackgroundRecall that there a 2 types of data channels:
We should keep the close process on the regular channels, but we can skip it on the handshake channel once the handshake is successful. Trying to send data on a channel where both ends are closing at the same time seems to break webrtc. I see 2 options:Option AIn order to send Flag:FIN on regular channels but not handshake channels, we would have to develop a complex way of determining whether the stream is a webrtc noise data channel (there is no api for it in Option B (Recommended)We simply // misc/webrtc-utils/src/noise.rs
// async fn outbound<T>()
// ..
// channel.close().await?; <= remove close
drop(channel); // <== skip straight to drop If we skip the outbound RecommendationI think we should implement Option B and just drop the outbound noise channel instead of calling Happy to hear your thoughts |
I am not completely well versed in the webrtc implementation in libp2p (and only know a bit about the specs) but based on the information provided, it sounds like option B might be a better choice. Option A would likely be complex but could probably end up introducing something that isnt in spec (unless this is something we should probably address in the spec itself as well?). Im just curious on any possible implications from just dropping the channel without closing it. |
As far as I can tell, the only impact to not closing the channel is not sending But since the remote is closing anyway as the handshake is done, this seems redundant. Plus the exchanged When the outbound handshake channel is directly dropped, the next channel is opened and the stream continues on to exchange bytes. |
Couple of questions to make sure I understand the issue correctly. I am not super familiar with the WebRTC implementation.
Where does this 10s timeout come from? Looking through the code, I can only find a 10s timeout for creating the
What do you mean by that? Where do they try do send data on the channel (that is used for the noise handshake) after it is closed?
I think it generally ensures that any previously written bytes are flushed, which is otherwise not guaranteed, no? |
Sure, welcome to the party! 🎉
I believe this is the
It's happening here:
Both sides of the noise handshake call
rust-libp2p/misc/webrtc-utils/src/stream.rs Lines 228 to 231 in 9e0e8be
I suspect what is happening is that
Yes I think we understand the same thing, but I suspect it's these last bytes that are sent and flushed that actually prevent the other side's underlying data channel from closing. The data channel has these bytes in it, but the stream is closed... so it's neither closed nor usable. This stays stuck until the I'm definitely open to other theories but this is what appears to be happening. 🤷 It may be a At the end of the day, the handshake is over at this point of closing the noise handshake channel. I realize the "proper" thing to do would be close the channel, but I'm not sure the design was to have 2 channels try to close themselves and the opposite side at the same time. Maybe we need to finish reading all channel bytes before closing too? |
Summary
When a webrtc stream is closed, a
Flag::FIN
is sent, but the remote never gets the flag. Same withFlag::RESET
, it's never received by the remote peer.In fact, I don't see any
Flag
s being received.This results in a
webrtc-utils
stream being closed on one end, but left to timeout on the other end.Specifically this is a problem right after handshake, where the noise DataChannel is left to timeout after 10 seconds, then a new data channel is established and streams can flow.
The impact is that WebRtc can only be used after this 10 second timeout until the new datachannel is made to replace the closed one.
Expected behavior
It's expected that when
rust-libp2p/misc/webrtc-utils/src/stream.rs
Lines 228 to 231 in 9e0e8be
is called, that the remote received the
Flag
Actual behavior
No inbound
Flag
(FIN
,RESET
) appears to be received by the remote. In fact, this code never seems to be called:rust-libp2p/misc/webrtc-utils/src/stream/state.rs
Lines 59 to 81 in 9e0e8be
Relevant log output
Possible Solution
I can't tell why the FIN and RESET flags are not being received. The code looks like it should work fine, yet we have an absence of these Flags.
Version
0.54.1
Would you like to work on fixing this bug?
I'll try
The text was updated successfully, but these errors were encountered: