-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP/2: check window sizes, implement streaming #158
Conversation
Thank you @the-mikedavis ! I've really wanted to get #88 tackled and the fact that we get request body streaming as a bonus is great! I'll need a few days, but I'll give this a proper review and also do some load testing so we can see how much better things are after this. |
Hey @the-mikedavis! Sorry for the delay, I've had a very busy week, but I have been able to spend short periods of time here and there testing this out. I had a few huge tasks on my plate that have now been completed, so I should have a bit more free time going forward. So far, I have seen some noticeable improvements, for example I put together this script: destructure [finch_version_input, body_size_input], System.argv()
finch_dep =
case finch_version_input do
"pr" ->
{:finch, github: "the-mikedavis/finch", ref: "window-sizes-and-streaming"}
_ ->
:finch
end
body_size = if body_size_input, do: String.to_integer(body_size_input), else: 1_000
body = String.duplicate("x", body_size)
Mix.install([finch_dep])
Finch.start_link(name: MyFinch, pools: %{default: [protocol: :http2]})
req = Finch.build(:post, "https://nghttp2.org/httpbin/post", [], body)
Finch.request(req, MyFinch)
|> IO.inspect(label: :output) It takes 2 args. If you pass "pr" as the first arg, this branch will be used, otherwise the hex version of Finch will be used. The second arg is the body size. We can clearly demonstrate the improvement of this PR by sending 1 byte more than the window size. Here is with hex Finch: $ check_window_size.exs main 1048577
output: {:error,
%Mint.HTTPError{
module: Mint.HTTP2,
reason: {:exceeds_window_size, :request, 1048576}
}} And here is with this PR: $ elixir check_window_size.exs pr 1048577
output: {:ok,
%Finch.Response{
body: "{\n \"args\": {}, \n \"data\": \"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" <> ...,
headers: [
{"date", "Mon, 08 Nov 2021 19:14:29 GMT"},
{"content-type", "application/json"},
{"content-length", "1048849"},
{"access-control-allow-origin", "*"},
{"access-control-allow-credentials", "true"},
{"x-backend-header-rtt", "3.783022"},
{"strict-transport-security", "max-age=31536000"},
{"server", "nghttpx"},
{"alt-svc", "h3=\":443\"; ma=3600, h3-29=\":443\"; ma=3600"},
{"via", "1.1 nghttpx"},
{"x-frame-options", "SAMEORIGIN"},
{"x-xss-protection", "1; mode=block"},
{"x-content-type-options", "nosniff"}
],
status: 200
}} This is already a huge win! I have another test is more of a benchmark style test that sends many concurrent requests. I will clean it up and share that with you ASAP. In this test, it seems that Finch gets into some kind of endless loop with this PR, whereas the hex version simply has a high error rate (also not ideal 😬) It could be a problem with my setup or the test itself (I have used it exclusively with http1 in the past and had to add http2 support to test this), so I would like to try to figure out more about what is happening before I send you on a wild goose chase. I'll get back to you soon, but I did not want to leave you hanging for much longer so here is my update. Thank you for your patience! |
Awesome! No rush, I'm actually on vacation this week away from a computer so it'll take me a bit to dig in for bugs+changes. I'm not really in a big rush with this pr anyways so no pressure! W.r.t. the infinite looping, I have some ideas what might be causing behavior like that but I'll await your stress test to be sure. Sounds like a fun bug to hunt down and squash :) |
Hey @the-mikedavis sorry for the delay! I had a huge task to finish up at my day job and then had family visiting from overseas for the past week. They are still here for a few more days but I found some time to get this together. I just pushed https://github.com/sneako/httpc_bench/tree/h2 and added some instructions to the readme. You can toggle between Finch versions by commenting and uncommenting in Please let me know if you have any questions or are struggling to get this running and/or reproducing the issue! |
Ok I've got it running locally. example outputs...main:
this pr
This is pretty curious! I'll dig in a bit and see what I can find |
Great! Those results match what I was seeing as well. |
Ok so I think I spent most of the time so far looking at the wrong bug but I think I may have found a (partial?) fix for #120 in elixir-mint/mint#339. The openresty h2 server has a kinda odd behavior where it's sending a GOAWAY after every 100th requests. Iirc the spec allows that but it seems to be inflating the error rate for the benchmark tests because the connection will go connected->read-only->disconnected->(wait)->connected every 100 successful requests. As far as I can tell so far, that seems to be what's provoking the connections to get into a wonky state and seem to hang. Still digging in! 🕵️ |
Great find @the-mikedavis! It looks like this behaviour is caused by this setting http://nginx.org/en/docs/http/ngx_http_core_module.html#keepalive_requests and "Prior to version 1.19.10, the default value was 100." I just tried increasing the value to 10,000 and the error rate looks much better, and neither version of Finch gets stuck anymore. Here are the results I get with this PR:
I think you are really on to something with elixir-mint/mint#339 |
Sweet, nice find! I'll keep poking around with the |
Ok so I did a bit more stress testing, now against the config: [
timeout: 1000,
path: "/wait/10",
pool_sizes: [256, 192, 128, 96, 64, 32],
concurrencies: [4096, 3064, 2048, 1024],
output: :csv,
port: 443,
iterations: 500000,
headers: [{"connection", "keep-alive"}],
host: "openresty",
clients: [HttpcBench.Client.Finch],
pool_counts: [32, 16, 8, 4],
pipelining: 1024,
test_function: :post
] where the notable difference is the {:error, %Mint.HTTPError{module: Mint.HTTP2, reason: :too_many_concurrent_requests}} which makes sense with some help from wireshark: it looks like the server is requesting I don't really know anything about NimblePool or pooling in general (😅) but maybe this is something that can be solved by the pool setup? In any case it seems to also be present when using finch on main so I'd like to separate out that work from this PR if possible. Should I spin up an issue? There's one more thing that I'd really like to test out before I feel happy about this PR: POSTing bodies double, +1, and -1 byte the size of the smallest window, just to make sure I have the edge cases correct. I'll see if I can knock that out right now! (edit: done! and I found an off-by-one 🎉) |
testing against the exact connection-level window size, I see on the last commit: | bytes to send | number of DATA frames | sizes of each frame | |----------------|-----------------------|---------------------| | 65_536 | 2 | 65_536, 0 | | 65_536 - 1 | 1 | 65_535 | | 65_536 + 1 | 3 | 65_535, 2, 0 | | 2 * 65_536 | 3 | 65_535, 65_537, 0 | | 2 * 65_536 - 1 | 3 | 65_535, 65_536, 0 | | 2 * 65_536 + 1 | 3 | 65_535, 65_538, 0 | this looks like the formula for how many bytes to fit when overloaded is off-by-one. after applying the change in this commit, the numbers work out perfectly: | bytes to send | number of DATA frames | sizes of each frame | |----------------|-----------------------|---------------------| | 65_536 | 2 | 65_536, 0 | | 65_536 - 1 | 1 | 65_535 | | 65_536 + 1 | 3 | 65_536, 1, 0 | | 2 * 65_536 | 3 | 65_536, 65_536, 0 | | 2 * 65_536 - 1 | 3 | 65_536, 65_535, 0 | | 2 * 65_536 + 1 | 3 | 65_536, 65_537, 0 |
Amazing! I really appreciate your work here @the-mikedavis. I agree that this RE: pooling, NimblePool is only used for HTTP/1 connections. For HTTP/2, we are treating each individual connection as a 'pool' and not using NimblePool at all. It sounds like we need to store I'll do a final round of testing on this PR ASAP so that we can get this great work merged and released! |
Thank you @the-mikedavis! This is a huge improvement to our H2 support ❤️ |
Released in |
I've added a failing test in #165 |
Ah good catch. I see the problem, I'll push up a PR to fix |
closes #88
HTTP/2 streaming is quite a different beast from HTTP/1. The main catch is the control-flow, which is really a backpressure mechanism in which the server limits the number of bytes you can send on a per-connection and per-request-stream level. Once the window has been consumed, the server will replenish the window with a SETTINGS frame, allowing the client to send more data.
The approach we take here is to follow the Mint documentation's recommendation of switch from doing
to
for all non-nil
body
s. Then we use ant:Enumerable.continuation/0
to gradually unroll the input stream and send chunks withMint.HTTP2.stream_request_body/3
, suspending when window needs to be replenished. When we receive a message from the server (withMint.HTTP2.stream/2
), it's possible that the window has been replenished for any active requests, so we continue theEnumerable.reduce/3
, repeating until the stream is complete. Finally we send theMint.HTTP2.stream_request_body(conn, ref, :eof)
.There are really two things being solved here: checking window sizes and implementing
{:stream, stream}
bodies for HTTP/2. They're very similar implementation-wise so I thought I'd try to knock out two birds with one stone in this PR.Any feedback is welcome (especially on the control flow parts if/with/case/etc) 🙂