Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Use 1MB socket RX buffers #2470

Merged
merged 8 commits into from
Apr 2, 2025
Merged

Conversation

larseggert
Copy link
Collaborator

We need to do the same for neqo-glue.

Fixes #1962

Copy link

codecov bot commented Mar 3, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.41%. Comparing base (483916d) to head (fea46ec).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2470      +/-   ##
==========================================
- Coverage   95.42%   95.41%   -0.02%     
==========================================
  Files         115      115              
  Lines       36996    36996              
  Branches    36996    36996              
==========================================
- Hits        35305    35301       -4     
- Misses       1687     1689       +2     
- Partials        4        6       +2     
Components Coverage Δ
neqo-common 97.17% <ø> (-0.36%) ⬇️
neqo-crypto 90.44% <ø> (ø)
neqo-http3 94.50% <ø> (ø)
neqo-qpack 96.29% <ø> (ø)
neqo-transport 96.24% <ø> (ø)
neqo-udp 95.29% <ø> (+0.58%) ⬆️

Copy link

github-actions bot commented Mar 3, 2025

Failed Interop Tests

QUIC Interop Runner, client vs. server, differences relative to 4eed4f7.

neqo-latest as client

neqo-latest as server

All results

Succeeded Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Unsupported Interop Tests

QUIC Interop Runner, client vs. server

neqo-latest as client

neqo-latest as server

Copy link

github-actions bot commented Mar 3, 2025

Benchmark results

Performance differences relative to 7d40924.

1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💚 Performance has improved.
       time:   [704.50 ms 708.15 ms 711.79 ms]
       thrpt:  [140.49 MiB/s 141.21 MiB/s 141.94 MiB/s]
change:
       time:   [-3.1952% -2.3455% -1.4922%] (p = 0.00 < 0.05)
       thrpt:  [+1.5148% +2.4019% +3.3007%]
1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: Change within noise threshold.
       time:   [347.08 ms 348.61 ms 350.17 ms]
       thrpt:  [28.558 Kelem/s 28.685 Kelem/s 28.812 Kelem/s]
change:
       time:   [-1.5978% -0.9139% -0.2819%] (p = 0.01 < 0.05)
       thrpt:  [+0.2827% +0.9223% +1.6238%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.
       time:   [24.923 ms 25.082 ms 25.243 ms]
       thrpt:  [39.614  elem/s 39.870  elem/s 40.123  elem/s]
change:
       time:   [-1.5362% -0.6441% +0.2080%] (p = 0.17 > 0.05)
       thrpt:  [-0.2076% +0.6483% +1.5601%]
1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💚 Performance has improved.
       time:   [1.7865 s 1.8073 s 1.8283 s]
       thrpt:  [54.695 MiB/s 55.330 MiB/s 55.975 MiB/s]
change:
       time:   [-8.4373% -6.9929% -5.5305%] (p = 0.00 < 0.05)
       thrpt:  [+5.8542% +7.5187% +9.2147%]

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

decode 4096 bytes, mask ff: No change in performance detected.
       time:   [12.003 µs 12.041 µs 12.088 µs]
       change: [-0.4803% +0.0742% +0.6016%] (p = 0.80 > 0.05)

Found 20 outliers among 100 measurements (20.00%)
1 (1.00%) low severe
4 (4.00%) low mild
2 (2.00%) high mild
13 (13.00%) high severe

decode 1048576 bytes, mask ff: No change in performance detected.
       time:   [2.9584 ms 2.9699 ms 2.9831 ms]
       change: [-0.5814% +0.0451% +0.6153%] (p = 0.90 > 0.05)

Found 14 outliers among 100 measurements (14.00%)
14 (14.00%) high severe

decode 4096 bytes, mask 7f: No change in performance detected.
       time:   [20.010 µs 20.075 µs 20.152 µs]
       change: [-0.6705% -0.1377% +0.3152%] (p = 0.60 > 0.05)

Found 17 outliers among 100 measurements (17.00%)
1 (1.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
11 (11.00%) high severe

decode 1048576 bytes, mask 7f: No change in performance detected.
       time:   [4.7927 ms 4.8042 ms 4.8173 ms]
       change: [-0.3041% +0.0455% +0.4013%] (p = 0.80 > 0.05)

Found 13 outliers among 100 measurements (13.00%)
13 (13.00%) high severe

decode 4096 bytes, mask 3f: No change in performance detected.
       time:   [6.3236 µs 6.3471 µs 6.3776 µs]
       change: [-1.3778% -0.1556% +0.7812%] (p = 0.81 > 0.05)

Found 17 outliers among 100 measurements (17.00%)
4 (4.00%) low severe
2 (2.00%) low mild
4 (4.00%) high mild
7 (7.00%) high severe

decode 1048576 bytes, mask 3f: No change in performance detected.
       time:   [2.1488 ms 2.1556 ms 2.1628 ms]
       change: [-0.4733% -0.0204% +0.4398%] (p = 0.89 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) high mild
7 (7.00%) high severe

1 streams of 1 bytes/multistream: Change within noise threshold.
       time:   [70.315 µs 70.529 µs 70.744 µs]
       change: [-4.2314% -2.2296% -0.7503%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

1000 streams of 1 bytes/multistream: No change in performance detected.
       time:   [25.305 ms 25.340 ms 25.376 ms]
       change: [-0.0838% +0.1249% +0.3177%] (p = 0.23 > 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

10000 streams of 1 bytes/multistream: No change in performance detected.
       time:   [1.6949 s 1.6964 s 1.6979 s]
       change: [-0.0881% +0.0342% +0.1581%] (p = 0.60 > 0.05)

Found 11 outliers among 100 measurements (11.00%)
8 (8.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe

1 streams of 1000 bytes/multistream: No change in performance detected.
       time:   [71.901 µs 72.548 µs 73.655 µs]
       change: [-2.3842% -0.4890% +1.6303%] (p = 0.68 > 0.05)

Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe

100 streams of 1000 bytes/multistream: Change within noise threshold.
       time:   [3.3560 ms 3.3628 ms 3.3700 ms]
       change: [+0.1566% +0.4503% +0.7181%] (p = 0.00 < 0.05)

Found 22 outliers among 100 measurements (22.00%)
22 (22.00%) high severe

1000 streams of 1000 bytes/multistream: No change in performance detected.
       time:   [143.07 ms 143.14 ms 143.21 ms]
       change: [-0.0044% +0.0709% +0.1437%] (p = 0.06 > 0.05)

Found 5 outliers among 100 measurements (5.00%)
5 (5.00%) high mild

coalesce_acked_from_zero 1+1 entries: No change in performance detected.
       time:   [94.989 ns 95.353 ns 95.716 ns]
       change: [-0.6889% +0.0080% +0.6229%] (p = 0.98 > 0.05)

Found 12 outliers among 100 measurements (12.00%)
10 (10.00%) high mild
2 (2.00%) high severe

coalesce_acked_from_zero 3+1 entries: No change in performance detected.
       time:   [112.77 ns 113.02 ns 113.29 ns]
       change: [-0.4282% -0.1160% +0.2350%] (p = 0.53 > 0.05)

Found 16 outliers among 100 measurements (16.00%)
1 (1.00%) low severe
2 (2.00%) low mild
6 (6.00%) high mild
7 (7.00%) high severe

coalesce_acked_from_zero 10+1 entries: No change in performance detected.
       time:   [112.42 ns 112.82 ns 113.34 ns]
       change: [-2.4664% -1.0381% -0.0463%] (p = 0.09 > 0.05)

Found 18 outliers among 100 measurements (18.00%)
5 (5.00%) low severe
4 (4.00%) low mild
3 (3.00%) high mild
6 (6.00%) high severe

coalesce_acked_from_zero 1000+1 entries: No change in performance detected.
       time:   [94.095 ns 99.067 ns 109.59 ns]
       change: [-0.7234% +2.3056% +6.8372%] (p = 0.32 > 0.05)

Found 8 outliers among 100 measurements (8.00%)
3 (3.00%) high mild
5 (5.00%) high severe

RxStreamOrderer::inbound_frame(): Change within noise threshold.
       time:   [117.28 ms 117.34 ms 117.41 ms]
       change: [+0.7327% +0.8109% +0.8896%] (p = 0.00 < 0.05)

Found 17 outliers among 100 measurements (17.00%)
6 (6.00%) low mild
7 (7.00%) high mild
4 (4.00%) high severe

SentPackets::take_ranges: No change in performance detected.
       time:   [8.1784 µs 8.4295 µs 8.6634 µs]
       change: [-1.5008% +2.0679% +5.8287%] (p = 0.26 > 0.05)

Found 21 outliers among 100 measurements (21.00%)
2 (2.00%) low severe
17 (17.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

transfer/pacing-false/varying-seeds: Change within noise threshold.
       time:   [36.013 ms 36.084 ms 36.154 ms]
       change: [+1.2182% +1.4833% +1.7551%] (p = 0.00 < 0.05)

Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) low mild

transfer/pacing-true/varying-seeds: No change in performance detected.
       time:   [35.762 ms 35.822 ms 35.883 ms]
       change: [-0.1298% +0.0957% +0.3438%] (p = 0.44 > 0.05)
transfer/pacing-false/same-seed: Change within noise threshold.
       time:   [35.724 ms 35.784 ms 35.843 ms]
       change: [+1.0010% +1.2560% +1.4973%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild

transfer/pacing-true/same-seed: Change within noise threshold.
       time:   [36.054 ms 36.099 ms 36.145 ms]
       change: [+0.9224% +1.1088% +1.2971%] (p = 0.00 < 0.05)

Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild

Client/server transfer results

Performance differences relative to 7d40924.

Transfer of 33554432 bytes over loopback, 30 runs. All unit-less numbers are in milliseconds.

Client Server CC Pacing Mean ± σ Min Max MiB/s ± σ Δ main Δ main
neqo neqo reno on 409.6 ± 18.2 385.0 458.5 78.1 ± 1.8 -10.1 -2.4%
neqo neqo reno 463.5 ± 175.1 387.0 1147.4 69.0 ± 0.2 0.9 0.2%
neqo neqo cubic on 406.4 ± 28.7 384.3 542.9 78.7 ± 1.1 -6.4 -1.6%
neqo neqo cubic 407.7 ± 31.8 382.6 548.4 78.5 ± 1.0 -6.3 -1.5%
google neqo reno on 763.9 ± 86.0 567.6 892.1 41.9 ± 0.4 1.4 0.2%
google neqo reno 769.6 ± 93.6 567.4 994.2 41.6 ± 0.3 7.7 1.0%
google neqo cubic on 771.5 ± 95.1 583.4 1048.1 41.5 ± 0.3 9.7 1.3%
google neqo cubic 777.2 ± 83.9 572.2 976.8 41.2 ± 0.4 12.3 1.6%
google google 583.0 ± 51.1 550.3 757.7 54.9 ± 0.6 -2.0 -0.3%
neqo msquic reno on 277.8 ± 41.3 242.9 409.6 115.2 ± 0.8 -1.5 -0.5%
neqo msquic reno 267.2 ± 18.5 242.4 314.7 119.7 ± 1.7 -15.2 -5.4%
neqo msquic cubic on 269.0 ± 25.3 246.7 381.2 119.0 ± 1.3 1.7 0.6%
neqo msquic cubic 266.4 ± 17.2 245.7 314.9 120.1 ± 1.9 -8.1 -3.0%
msquic msquic 176.7 ± 27.7 147.6 287.5 181.1 ± 1.2 -10.3 -5.5%

⬇️ Download logs

Copy link
Collaborator

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of implementing this in neqo-bin, how about proposing it upstream quinn-udp:

https://github.com/quinn-rs/quinn/blob/8d6e48c20b71f7e915c13c33b66e2a09b6d59888/quinn-udp/src/unix.rs#L85-L86

I assume close to all QUIC implementations will want a large send and receive buffer.

@larseggert
Copy link
Collaborator Author

Instead of implementing this in neqo-bin, how about proposing it upstream quinn-udp:

Sounds good. Or maybe quinn-udp can offer functions to set those values.

I don't understand why the benches show a performance improvement, but the runs over loopback show a regression.

@larseggert
Copy link
Collaborator Author

@mxinden where would you suggest to add this in quinn-udp? As part of UdpSocketState or somewhere else?

@larseggert
Copy link
Collaborator Author

@mxinden alternatively, I'd suggest merging this to neqo now and switching over to any quinn-udp mechanism in a future PR.

@mxinden
Copy link
Collaborator

mxinden commented Mar 19, 2025

@larseggert how about something along the lines of:

diff --git a/quinn-udp/src/unix.rs b/quinn-udp/src/unix.rs
index c39941d5..0b5bbd21 100644
--- a/quinn-udp/src/unix.rs
+++ b/quinn-udp/src/unix.rs
@@ -257,6 +257,14 @@ impl UdpSocketState {
         self.may_fragment
     }
 
+    pub fn set_send_buffer_size(&self, bytes: usize)-> io::Result<()>  {
+        todo!();
+    }
+
+    pub fn set_rec_buffer_size(&self, bytes: usize) -> io::Result<()> {
+        todo!();
+    }
+
     /// Returns true if we previously got an EINVAL error from `sendmsg` syscall.
     fn sendmsg_einval(&self) -> bool {
         self.sendmsg_einval.load(Ordering::Relaxed)
@@ -543,7 +551,7 @@ fn recv(io: SockRef<'_>, bufs: &mut [IoSliceMut<'_>], meta: &mut [RecvMeta]) ->
     Ok(1)
 }

@larseggert
Copy link
Collaborator Author

@larseggert how about something along the lines of

Do we also need the equivalent for Windows?

@mxinden
Copy link
Collaborator

mxinden commented Mar 19, 2025

Yes. I would say the more platforms the better.

@larseggert
Copy link
Collaborator Author

@mxinden let me know if quinn-rs/quinn#2179 is what you had in mind.

@larseggert larseggert marked this pull request as draft March 24, 2025 15:36
We need to do the same for `neqo-glue`.

Fixes mozilla#1962
@larseggert
Copy link
Collaborator Author

larseggert commented Mar 25, 2025

On my Mac, I now see:

0.003 DEBUG Increasing send buffer size from 9216 to 1048576
0.003 DEBUG Increasing receive buffer size from 786896 to 1048576

Note that the send buffer size is apparently based on the net.inet.udp.maxdgram sysctl? Weird. Need to check this on other platforms.

Linux (from a QNS run):

0.002 DEBUG Increasing send buffer size from 212992 to 1048576
0.002 DEBUG Default receive buffer size is 1048576, not changing

Windows (from CI):

0.000 WARN Increasing send buffer size from 131072 to 1048576
0.000 WARN Increasing receive buffer size from 131072 to 1048576

https://lists.freebsd.org/pipermail/freebsd-questions/2005-February/075624.html says:

UDP is not buffered in the kernel like TCP is, so a sendspace sysctl
is not possible. When the interface queue becomes full (i.e. you are
sending data to the interface at a greater rate than it can put it on
the wire), you will see this error message, and your application needs
to be able to handle it.

This seems to indicate that we should not do this increase on macOS and BSDs?
@mxinden do we see ENOBUFS and are we handling that similar to EAGAIN?

@larseggert larseggert marked this pull request as ready for review March 25, 2025 12:32
@larseggert
Copy link
Collaborator Author

I also don't understand why loopback benchmark performance is going down.

@mxinden
Copy link
Collaborator

mxinden commented Mar 25, 2025

I assume more buffer space is not always better. This reminds me of the Source Buffer Management draft.

https://stuartcheshire.github.io/draft-cheshire-sbm/draft-cheshire-sbm.html

@mxinden
Copy link
Collaborator

mxinden commented Mar 25, 2025

@mxinden do we see ENOBUFS and are we handling that similar to EAGAIN?

We currently don't. Something along the following lines should work. Want to try running a benchmark on your local Mac to see whether either of them triggers?

diff --git a/Cargo.lock b/Cargo.lock
index 64bfcc1f..d6b8b6a7 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -1028,6 +1028,7 @@ name = "neqo-udp"
 version = "0.12.2"
 dependencies = [
  "cfg_aliases",
+ "libc",
  "log",
  "neqo-common",
  "quinn-udp",
diff --git a/neqo-udp/Cargo.toml b/neqo-udp/Cargo.toml
index 4abd402a..8ebc2e83 100644
--- a/neqo-udp/Cargo.toml
+++ b/neqo-udp/Cargo.toml
@@ -19,6 +19,7 @@ workspace = true
 log = { workspace = true }
 neqo-common = { path = "./../neqo-common" }
 quinn-udp = { workspace = true }
+libc = "0.2"
 
 [build-dependencies]
 cfg_aliases = "0.2"
diff --git a/neqo-udp/src/lib.rs b/neqo-udp/src/lib.rs
index d498f5aa..0cfe0e74 100644
--- a/neqo-udp/src/lib.rs
+++ b/neqo-udp/src/lib.rs
@@ -74,7 +74,19 @@ pub fn send_inner(
         src_ip: None,
     };
 
-    state.try_send(socket, &transmit)?;
+    if let Err(e) = state.try_send(socket, &transmit) {
+        match e.raw_os_error() {
+            Some(libc::ENOBUFS) => {
+                todo!();
+            }
+            Some(libc::EAGAIN) => {
+                todo!();
+            }
+            Some(_) | None => {}
+        }
+
+        return Err(e);
+    }
 
     qtrace!(
         "sent {} bytes from {} to {}",

Related: Neither of them are currently exposed through std::io::ErrorKind. See rust-lang/rust#79965 for some discussion on adding these to std.

@larseggert
Copy link
Collaborator Author

I assume more buffer space is not always better. This reminds me of the Source Buffer Management draft.

stuartcheshire.github.io/draft-cheshire-sbm/draft-cheshire-sbm.html

It's different, in that I doubt that we actually fill that larger buffer, and they do. (If we filled it, we should IMO also see a throughput increase.)

@larseggert
Copy link
Collaborator Author

OK, with the fixed bencher the erratic performance seems to be fixed 🎉

I guess we'd want this one merged to see the full impact of #1868?

@mxinden
Copy link
Collaborator

mxinden commented Apr 2, 2025

On send buffer increase:

I assume more buffer space is not always better. This reminds me of the Source Buffer Management draft.
stuartcheshire.github.io/draft-cheshire-sbm/draft-cheshire-sbm.html

It's different, in that I doubt that we actually fill that larger buffer, and they do. (If we filled it, we should IMO also see a throughput increase.)

Can you expand on why the above issue of source buffer bloat is not an issue here? In addition, if we never fill the larger buffer, why increase it in the first place?

Given that Neqo paces the sending of packets, and given that this pacing is on the order of milliseconds, assuming that each pacing batch fits into the send buffer, shouldn't that resolve the need for a large OS UDP send buffer size, as each pacing gab gives the OS enough time to empty the buffer?

Also on the matter of pacing, wouldn't filling a large send buffer undo our user space pacing effort?

Now that we do PR_POLL_WRITE in Firefox, I am less worried about a small send buffer.

On receive buffer increase:

Increasing the receive buffer size in neqo-client and neqo-server sounds good to me.

Note that we already increase the receive buffer size in Firefox. Thus there are no changes required in neqo_glue. We could however now do the increase through quinn-udp instead of NSPR.

@larseggert
Copy link
Collaborator Author

If we want to experiment more, should we drop the increase to the send buffer for now?

(Also, I think SO_SNDBUF on BSD/Apple is not the same as on Linux, because on these platforms it actually seems to control net.inet.udp.maxdgram.)

@mxinden
Copy link
Collaborator

mxinden commented Apr 2, 2025

If we want to experiment more, should we drop the increase to the send buffer for now?

Sounds good to me.

@larseggert larseggert changed the title fix: Use 1MB socket TX and RX buffers fix: Use 1MB socket RX buffers Apr 2, 2025
larseggert and others added 2 commits April 2, 2025 16:17
Co-authored-by: Max Inden <[email protected]>
Signed-off-by: Lars Eggert <[email protected]>
@larseggert larseggert enabled auto-merge April 2, 2025 15:00
@mxinden
Copy link
Collaborator

mxinden commented Apr 2, 2025

@larseggert might need a call to cargo fmt?

@larseggert larseggert added this pull request to the merge queue Apr 2, 2025
Merged via the queue into mozilla:main with commit 9354a53 Apr 2, 2025
79 of 80 checks passed
@larseggert larseggert deleted the fix-1962 branch April 2, 2025 15:49
let recv_buf_before = state.recv_buffer_size((&socket).into())?;
if recv_buf_before < ONE_MB {
// Same as Firefox.
// <https://searchfox.org/mozilla-central/source/modules/libpref/init/StaticPrefList.yaml#13474-13478>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When citing searchfox, don't use the "latest" links, use the versioned links.

https://searchfox.org/mozilla-central/rev/fa5b44a4ea5c98b6a15f39638ea4cd04dc271f3d/modules/libpref/init/StaticPrefList.yaml#13474-13477 is what you are looking for here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider increasing OS socket buffer (SO_SNDBUF, SO_RCVBUF)
3 participants