-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Use 1MB socket RX buffers #2470
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2470 +/- ##
==========================================
- Coverage 95.42% 95.41% -0.02%
==========================================
Files 115 115
Lines 36996 36996
Branches 36996 36996
==========================================
- Hits 35305 35301 -4
- Misses 1687 1689 +2
- Partials 4 6 +2
|
Failed Interop TestsQUIC Interop Runner, client vs. server, differences relative to 4eed4f7. neqo-latest as client
neqo-latest as server
All resultsSucceeded Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
Unsupported Interop TestsQUIC Interop Runner, client vs. server neqo-latest as client
neqo-latest as server
|
Benchmark resultsPerformance differences relative to 7d40924. 1-conn/1-100mb-resp/mtu-1504 (aka. Download)/client: 💚 Performance has improved.time: [704.50 ms 708.15 ms 711.79 ms] thrpt: [140.49 MiB/s 141.21 MiB/s 141.94 MiB/s] change: time: [-3.1952% -2.3455% -1.4922%] (p = 0.00 < 0.05) thrpt: [+1.5148% +2.4019% +3.3007%] 1-conn/10_000-parallel-1b-resp/mtu-1504 (aka. RPS)/client: Change within noise threshold.time: [347.08 ms 348.61 ms 350.17 ms] thrpt: [28.558 Kelem/s 28.685 Kelem/s 28.812 Kelem/s] change: time: [-1.5978% -0.9139% -0.2819%] (p = 0.01 < 0.05) thrpt: [+0.2827% +0.9223% +1.6238%] 1-conn/1-1b-resp/mtu-1504 (aka. HPS)/client: No change in performance detected.time: [24.923 ms 25.082 ms 25.243 ms] thrpt: [39.614 elem/s 39.870 elem/s 40.123 elem/s] change: time: [-1.5362% -0.6441% +0.2080%] (p = 0.17 > 0.05) thrpt: [-0.2076% +0.6483% +1.5601%] 1-conn/1-100mb-req/mtu-1504 (aka. Upload)/client: 💚 Performance has improved.time: [1.7865 s 1.8073 s 1.8283 s] thrpt: [54.695 MiB/s 55.330 MiB/s 55.975 MiB/s] change: time: [-8.4373% -6.9929% -5.5305%] (p = 0.00 < 0.05) thrpt: [+5.8542% +7.5187% +9.2147%] decode 4096 bytes, mask ff: No change in performance detected.time: [12.003 µs 12.041 µs 12.088 µs] change: [-0.4803% +0.0742% +0.6016%] (p = 0.80 > 0.05) decode 1048576 bytes, mask ff: No change in performance detected.time: [2.9584 ms 2.9699 ms 2.9831 ms] change: [-0.5814% +0.0451% +0.6153%] (p = 0.90 > 0.05) decode 4096 bytes, mask 7f: No change in performance detected.time: [20.010 µs 20.075 µs 20.152 µs] change: [-0.6705% -0.1377% +0.3152%] (p = 0.60 > 0.05) decode 1048576 bytes, mask 7f: No change in performance detected.time: [4.7927 ms 4.8042 ms 4.8173 ms] change: [-0.3041% +0.0455% +0.4013%] (p = 0.80 > 0.05) decode 4096 bytes, mask 3f: No change in performance detected.time: [6.3236 µs 6.3471 µs 6.3776 µs] change: [-1.3778% -0.1556% +0.7812%] (p = 0.81 > 0.05) decode 1048576 bytes, mask 3f: No change in performance detected.time: [2.1488 ms 2.1556 ms 2.1628 ms] change: [-0.4733% -0.0204% +0.4398%] (p = 0.89 > 0.05) 1 streams of 1 bytes/multistream: Change within noise threshold.time: [70.315 µs 70.529 µs 70.744 µs] change: [-4.2314% -2.2296% -0.7503%] (p = 0.00 < 0.05) 1000 streams of 1 bytes/multistream: No change in performance detected.time: [25.305 ms 25.340 ms 25.376 ms] change: [-0.0838% +0.1249% +0.3177%] (p = 0.23 > 0.05) 10000 streams of 1 bytes/multistream: No change in performance detected.time: [1.6949 s 1.6964 s 1.6979 s] change: [-0.0881% +0.0342% +0.1581%] (p = 0.60 > 0.05) 1 streams of 1000 bytes/multistream: No change in performance detected.time: [71.901 µs 72.548 µs 73.655 µs] change: [-2.3842% -0.4890% +1.6303%] (p = 0.68 > 0.05) 100 streams of 1000 bytes/multistream: Change within noise threshold.time: [3.3560 ms 3.3628 ms 3.3700 ms] change: [+0.1566% +0.4503% +0.7181%] (p = 0.00 < 0.05) 1000 streams of 1000 bytes/multistream: No change in performance detected.time: [143.07 ms 143.14 ms 143.21 ms] change: [-0.0044% +0.0709% +0.1437%] (p = 0.06 > 0.05) coalesce_acked_from_zero 1+1 entries: No change in performance detected.time: [94.989 ns 95.353 ns 95.716 ns] change: [-0.6889% +0.0080% +0.6229%] (p = 0.98 > 0.05) coalesce_acked_from_zero 3+1 entries: No change in performance detected.time: [112.77 ns 113.02 ns 113.29 ns] change: [-0.4282% -0.1160% +0.2350%] (p = 0.53 > 0.05) coalesce_acked_from_zero 10+1 entries: No change in performance detected.time: [112.42 ns 112.82 ns 113.34 ns] change: [-2.4664% -1.0381% -0.0463%] (p = 0.09 > 0.05) coalesce_acked_from_zero 1000+1 entries: No change in performance detected.time: [94.095 ns 99.067 ns 109.59 ns] change: [-0.7234% +2.3056% +6.8372%] (p = 0.32 > 0.05) RxStreamOrderer::inbound_frame(): Change within noise threshold.time: [117.28 ms 117.34 ms 117.41 ms] change: [+0.7327% +0.8109% +0.8896%] (p = 0.00 < 0.05) SentPackets::take_ranges: No change in performance detected.time: [8.1784 µs 8.4295 µs 8.6634 µs] change: [-1.5008% +2.0679% +5.8287%] (p = 0.26 > 0.05) transfer/pacing-false/varying-seeds: Change within noise threshold.time: [36.013 ms 36.084 ms 36.154 ms] change: [+1.2182% +1.4833% +1.7551%] (p = 0.00 < 0.05) transfer/pacing-true/varying-seeds: No change in performance detected.time: [35.762 ms 35.822 ms 35.883 ms] change: [-0.1298% +0.0957% +0.3438%] (p = 0.44 > 0.05) transfer/pacing-false/same-seed: Change within noise threshold.time: [35.724 ms 35.784 ms 35.843 ms] change: [+1.0010% +1.2560% +1.4973%] (p = 0.00 < 0.05) transfer/pacing-true/same-seed: Change within noise threshold.time: [36.054 ms 36.099 ms 36.145 ms] change: [+0.9224% +1.1088% +1.2971%] (p = 0.00 < 0.05) Client/server transfer resultsPerformance differences relative to 7d40924. Transfer of 33554432 bytes over loopback, 30 runs. All unit-less numbers are in milliseconds.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of implementing this in neqo-bin
, how about proposing it upstream quinn-udp
:
I assume close to all QUIC implementations will want a large send and receive buffer.
Sounds good. Or maybe I don't understand why the benches show a performance improvement, but the runs over loopback show a regression. |
@mxinden where would you suggest to add this in |
@mxinden alternatively, I'd suggest merging this to neqo now and switching over to any |
@larseggert how about something along the lines of: diff --git a/quinn-udp/src/unix.rs b/quinn-udp/src/unix.rs
index c39941d5..0b5bbd21 100644
--- a/quinn-udp/src/unix.rs
+++ b/quinn-udp/src/unix.rs
@@ -257,6 +257,14 @@ impl UdpSocketState {
self.may_fragment
}
+ pub fn set_send_buffer_size(&self, bytes: usize)-> io::Result<()> {
+ todo!();
+ }
+
+ pub fn set_rec_buffer_size(&self, bytes: usize) -> io::Result<()> {
+ todo!();
+ }
+
/// Returns true if we previously got an EINVAL error from `sendmsg` syscall.
fn sendmsg_einval(&self) -> bool {
self.sendmsg_einval.load(Ordering::Relaxed)
@@ -543,7 +551,7 @@ fn recv(io: SockRef<'_>, bufs: &mut [IoSliceMut<'_>], meta: &mut [RecvMeta]) ->
Ok(1)
} |
Do we also need the equivalent for Windows? |
Yes. I would say the more platforms the better. |
@mxinden let me know if quinn-rs/quinn#2179 is what you had in mind. |
We need to do the same for `neqo-glue`. Fixes mozilla#1962
On my Mac, I now see:
Note that the send buffer size is apparently based on the Linux (from a QNS run):
Windows (from CI):
https://lists.freebsd.org/pipermail/freebsd-questions/2005-February/075624.html says:
This seems to indicate that we should not do this increase on macOS and BSDs? |
I also don't understand why loopback benchmark performance is going down. |
I assume more buffer space is not always better. This reminds me of the Source Buffer Management draft. https://stuartcheshire.github.io/draft-cheshire-sbm/draft-cheshire-sbm.html |
We currently don't. Something along the following lines should work. Want to try running a benchmark on your local Mac to see whether either of them triggers? diff --git a/Cargo.lock b/Cargo.lock
index 64bfcc1f..d6b8b6a7 100644
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -1028,6 +1028,7 @@ name = "neqo-udp"
version = "0.12.2"
dependencies = [
"cfg_aliases",
+ "libc",
"log",
"neqo-common",
"quinn-udp",
diff --git a/neqo-udp/Cargo.toml b/neqo-udp/Cargo.toml
index 4abd402a..8ebc2e83 100644
--- a/neqo-udp/Cargo.toml
+++ b/neqo-udp/Cargo.toml
@@ -19,6 +19,7 @@ workspace = true
log = { workspace = true }
neqo-common = { path = "./../neqo-common" }
quinn-udp = { workspace = true }
+libc = "0.2"
[build-dependencies]
cfg_aliases = "0.2"
diff --git a/neqo-udp/src/lib.rs b/neqo-udp/src/lib.rs
index d498f5aa..0cfe0e74 100644
--- a/neqo-udp/src/lib.rs
+++ b/neqo-udp/src/lib.rs
@@ -74,7 +74,19 @@ pub fn send_inner(
src_ip: None,
};
- state.try_send(socket, &transmit)?;
+ if let Err(e) = state.try_send(socket, &transmit) {
+ match e.raw_os_error() {
+ Some(libc::ENOBUFS) => {
+ todo!();
+ }
+ Some(libc::EAGAIN) => {
+ todo!();
+ }
+ Some(_) | None => {}
+ }
+
+ return Err(e);
+ }
qtrace!(
"sent {} bytes from {} to {}", Related: Neither of them are currently exposed through |
It's different, in that I doubt that we actually fill that larger buffer, and they do. (If we filled it, we should IMO also see a throughput increase.) |
OK, with the fixed bencher the erratic performance seems to be fixed 🎉 I guess we'd want this one merged to see the full impact of #1868? |
On send buffer increase:
Can you expand on why the above issue of source buffer bloat is not an issue here? In addition, if we never fill the larger buffer, why increase it in the first place? Given that Neqo paces the sending of packets, and given that this pacing is on the order of milliseconds, assuming that each pacing batch fits into the send buffer, shouldn't that resolve the need for a large OS UDP send buffer size, as each pacing gab gives the OS enough time to empty the buffer? Also on the matter of pacing, wouldn't filling a large send buffer undo our user space pacing effort? Now that we do On receive buffer increase: Increasing the receive buffer size in Note that we already increase the receive buffer size in Firefox. Thus there are no changes required in |
If we want to experiment more, should we drop the increase to the send buffer for now? (Also, I think |
Sounds good to me. |
Co-authored-by: Max Inden <[email protected]> Signed-off-by: Lars Eggert <[email protected]>
@larseggert might need a call to |
let recv_buf_before = state.recv_buffer_size((&socket).into())?; | ||
if recv_buf_before < ONE_MB { | ||
// Same as Firefox. | ||
// <https://searchfox.org/mozilla-central/source/modules/libpref/init/StaticPrefList.yaml#13474-13478> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When citing searchfox, don't use the "latest" links, use the versioned links.
https://searchfox.org/mozilla-central/rev/fa5b44a4ea5c98b6a15f39638ea4cd04dc271f3d/modules/libpref/init/StaticPrefList.yaml#13474-13477 is what you are looking for here.
We need to do the same for
neqo-glue
.Fixes #1962