Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

burst deadline exceeded error in v1.38.0? #4522

Closed
ukai opened this issue Jun 8, 2021 · 7 comments
Closed

burst deadline exceeded error in v1.38.0? #4522

ukai opened this issue Jun 8, 2021 · 7 comments

Comments

@ukai
Copy link

ukai commented Jun 8, 2021

What version of gRPC are you using?

v1.38.0

What version of Go are you using (go version)?

go version go1.16.4 linux/amd64

What operating system (Linux, Windows, …) and version?

Linux

What did you do?

If possible, provide a recipe for reproducing the error.

I've rolled up grpc from v1.37.1 to v1.38.0 in canary recently.

What did you expect to see?

no significant changes between v1.37.1 and v1.38.0

What did you see instead?

We observed busty deadline exceeded error in canary (for ~30 minutes).
error rate in production is usually < 0.0004, but error rate in canary at that time is ~0.004. (10x than prod).
most error seems to be deadline exceeded error.

also see error like

"[transport]transport: loopyWriter.run returning. Err: write tcp 10.49.154.2:5050->10.49.120.4:36094: use of closed network connection"
@zasweq
Copy link
Contributor

zasweq commented Jun 9, 2021

The team needs a bit more information. Please clarify with details about your environment. Please try 1.37.1 in Canary and see if the delta is as pronounced.

@ukai
Copy link
Author

ukai commented Jun 10, 2021

We're using grpc client in GKE pod to communicate with remotebuildexecution.googleapis.com.

We didn't see such deadline exceeded error spikes with v1.37.1.
(v1.37.1 was used in prod, and we didn't see any difference if we revert canary back to v1.37.1)

@github-actions
Copy link

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@github-actions github-actions bot added the stale label Jun 16, 2021
@dfawley
Copy link
Member

dfawley commented Jun 16, 2021

(Sorry, we didn't remove the label.)

There's so little to go on here, we will definitely need more information to be able to make any progress.

You say this only happened for 30 minutes -- is this kind of burstiness periodically coming & going? Any other information you can provide about your environment (proxies or no proxies, etc), configuration of client & server, etc, would help. And if it's possible to distill it down to a minimal reproduction example you can share, that would be the best.

@dfawley
Copy link
Member

dfawley commented Jun 23, 2021

The bot is supposed to send a friendly ping 7 days before closing. Reopening for another 7d.

@dfawley dfawley reopened this Jun 23, 2021
@dfawley dfawley removed the stale label Jun 23, 2021
@github-actions
Copy link

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

@github-actions github-actions bot added the stale label Jun 29, 2021
@ukai
Copy link
Author

ukai commented Jun 30, 2021

I tried v1.38.0 in canary again, but couldn't see the similar symptoms for 2 days.
instead, some high latency in client side observed in prod (grpc v1.37.1), so it might not be a regression in v1.38.0
(but some potential issues before v1.38.0?)

@ukai ukai closed this as completed Jun 30, 2021
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants