Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipe server goes nuts sometimes #3684

Closed
bboreham opened this issue Sep 16, 2019 · 2 comments · Fixed by #3784
Closed

Pipe server goes nuts sometimes #3684

bboreham opened this issue Sep 16, 2019 · 2 comments · Fixed by #3784
Labels
performance Excessive resource usage and latency; usually a bug or chore

Comments

@bboreham
Copy link
Collaborator

This is observed in multi-tenant mode; all replicas are spinning with high CPU consumption in multitenant.(*consulPipeRouter).watchAll(). Consul is also showing high CPU.

Logs at the start of the high-cpu period:

<app> ERRO: 2019/09/16 11:03:52.726035 Error CASing pipes/15107-pipe-7240779112489848809: Error: Pipe pipes/15\
107-pipe-7240779112489848809 has existing connection to 10.244.202.251:4444
<app> ERRO: 2019/09/16 11:03:52.727727 Error CASing pipes/15107-pipe-7240779112489848809: Error: Pipe pipes/15\
107-pipe-7240779112489848809 has existing connection to 10.244.202.251:4444
<app> ERRO: 2019/09/16 11:03:52.730799 Error CASing pipes/15107-pipe-7240779112489848809: Error: Pipe pipes/15\
107-pipe-7240779112489848809 has existing connection to 10.244.202.251:4444
<app> ERRO: 2019/09/16 11:03:52.732553 Error CASing pipes/15107-pipe-7240779112489848809: Error: Pipe pipes/15\
107-pipe-7240779112489848809 has existing connection to 10.244.202.251:4444
<app> ERRO: 2019/09/16 11:03:52.734154 Error CASing pipes/15107-pipe-7240779112489848809: Error: Pipe pipes/15\
107-pipe-7240779112489848809 has existing connection to 10.244.202.251:4444
<app> INFO: 2019/09/16 11:03:54.047200 Get pipes/15428-pipe-8723926611138769060:probe
<app> INFO: 2019/09/16 11:03:54.683783 Creating pipe pipes/15428-pipe-8723926611138769060
<app> INFO: 2019/09/16 11:03:54.768161 Release pipes/15428-pipe-8723926611138769060:probe
<app> INFO: 2019/09/16 11:03:55.193087 Deleting pipe pipes/15428-pipe-8723926611138769060
<app> INFO: 2019/09/16 11:03:56.174861 Get pipes/15428-pipe-8723926611138769060:probe
<app> INFO: 2019/09/16 11:03:57.005606 Creating pipe pipes/15428-pipe-8723926611138769060
<app> INFO: 2019/09/16 11:03:57.080772 Release pipes/15428-pipe-8723926611138769060:probe
<app> INFO: 2019/09/16 11:03:57.314012 Get pipes/15428-pipe-8723926611138769060:probe
<app> INFO: 2019/09/16 11:03:57.413836 Release pipes/15428-pipe-8723926611138769060:probe
<app> INFO: 2019/09/16 11:03:57.631618 Deleting pipe pipes/15428-pipe-8723926611138769060

the creating/deleting cycle continues, with different tenant IDs appearing from time to time. The 'error CASing' message reappear every so often, sometimes up 2-3 minutes between batches.

According to monitoring data we have experienced periods of high CPU in the pipe service for months.

@fbarl fbarl added the performance Excessive resource usage and latency; usually a bug or chore label Sep 16, 2019
@bboreham
Copy link
Collaborator Author

Possibly related: today a single pipe server with 10,000 goroutines.

Top of goroutine dump:

goroutine profile: total 8989
8958 @ 0x433320 0x442e7b 0x4ad297 0x4adb5c 0xb86e87 0x4615c1
#   0x4ad296    io.(*pipe).Read+0xe6                                /usr/local/go/src/io/pipe.go:50
#   0x4adb5b    io.(*PipeReader).Read+0x4b                          /usr/local/go/src/io/pipe.go:127
#   0xb86e86    github.com/weaveworks/scope/common/xfer.(*pipe).CopyToWebsocket.func2+0x86  /go/src/github.com/weaveworks/scop\
e/common/xfer/pipes.go:141

5 @ 0x433320 0x442e7b 0xb84198 0x16f255f 0x16f2ab4 0x6ea7c4 0x7a99db 0x988b6c 0x6ea7c4 0x988cfe 0x6ea7c4 0x989189 0x6ea7c4 0x6\
edc14 0x6e95b5 0x4615c1
#   0xb84197    github.com/weaveworks/scope/common/xfer.(*pipe).CopyToWebsocket+0x297                   /go/src/github.com/wea\
veworks/scope/common/xfer/pipes.go:160
#   0x16f255e   github.com/weaveworks/scope/app.handlePipeWs.func1+0x50e                        /go/src/github.com/weaveworks/\
scope/app/pipes.go:69
#   0x16f2ab3   github.com/weaveworks/scope/app.requestContextDecorator.func1+0xb3                  /go/src/github.com/weavewo\
rks/scope/app/router.go:46
#   0x6ea7c3    net/http.HandlerFunc.ServeHTTP+0x43                                 /usr/local/go/src/net/http/server.go:2007
#   0x7a99da    github.com/weaveworks/scope/vendor/github.com/gorilla/mux.(*Router).ServeHTTP+0xda          /go/src/github.com\
/weaveworks/scope/vendor/github.com/gorilla/mux/mux.go:107
#   0x988b6b    github.com/weaveworks/scope/vendor/github.com/weaveworks/common/middleware.Tracer.Wrap.func1+0x8b   /go/src/gi\
thub.com/weaveworks/scope/vendor/github.com/weaveworks/common/middleware/http_tracing.go:25
#   0x6ea7c3    net/http.HandlerFunc.ServeHTTP+0x43                                 /usr/local/go/src/net/http/server.go:2007
#   0x988cfd    github.com/weaveworks/scope/vendor/github.com/weaveworks/common/middleware.Instrument.Wrap.func1+0x16d  /go/sr\
c/github.com/weaveworks/scope/vendor/github.com/weaveworks/common/middleware/instrument.go:49
#   0x6ea7c3    net/http.HandlerFunc.ServeHTTP+0x43                                 /usr/local/go/src/net/http/server.go:2007
#   0x989188    github.com/weaveworks/scope/vendor/github.com/weaveworks/common/middleware.Log.Wrap.func1+0x2a8     /go/src/gi\
thub.com/weaveworks/scope/vendor/github.com/weaveworks/common/middleware/logging.go:36
#   0x6ea7c3    net/http.HandlerFunc.ServeHTTP+0x43                                 /usr/local/go/src/net/http/server.go:2007
#   0x6edc13    net/http.serverHandler.ServeHTTP+0xa3                                   /usr/local/go/src/net/http/server.go:2\
802
#   0x6e95b4    net/http.(*conn).serve+0x874                                        /usr/local/go/src/net/http/server.go:1890

3 @ 0x433320 0x42e0ba 0x42d685 0x4ced65 0x4cfcdf 0x4cfcc1 0x5b334f 0x5c7578 0x565c2a 0xb7e968 0xb7eaba 0xb7f866 0xb7fd8f 0xb85\
cbf 0xb86ccd 0x4615c1
#   0x42d684    internal/poll.runtime_pollWait+0x54                             /usr/local/go/src/runtime/netpoll.go:184
#   0x4ced64    internal/poll.(*pollDesc).wait+0x44                             /usr/local/go/src/internal/poll/fd_poll_runtim\
e.go:87
#   0x4cfcde    internal/poll.(*pollDesc).waitRead+0x1ce                            /usr/local/go/src/internal/poll/fd_poll_ru\
ntime.go:92
#   0x4cfcc0    internal/poll.(*FD).Read+0x1b0                                  /usr/local/go/src/internal/poll/fd_unix.go:169
#   0x5b334e    net.(*netFD).Read+0x4e                                      /usr/local/go/src/net/fd_unix.go:202
#   0x5c7577    net.(*conn).Read+0x67                                       /usr/local/go/src/net/net.go:184
#   0x565c29    bufio.(*Reader).Read+0x269                                  /usr/local/go/src/bufio/bufio.go:226
#   0xb7e967    github.com/weaveworks/scope/vendor/github.com/gorilla/websocket.(*Conn).readFull+0x77       /go/src/github.com\
/weaveworks/scope/vendor/github.com/gorilla/websocket/conn.go:617
#   0xb7eab9    github.com/weaveworks/scope/vendor/github.com/gorilla/websocket.(*Conn).advanceFrame+0x89   /go/src/github.com\
/weaveworks/scope/vendor/github.com/gorilla/websocket/conn.go:641
#   0xb7f865    github.com/weaveworks/scope/vendor/github.com/gorilla/websocket.(*Conn).NextReader+0x55     /go/src/github.com\
/weaveworks/scope/vendor/github.com/gorilla/websocket/conn.go:780
#   0xb7fd8e    github.com/weaveworks/scope/vendor/github.com/gorilla/websocket.(*Conn).ReadMessage+0x2e    /go/src/github.com\
/weaveworks/scope/vendor/github.com/gorilla/websocket/conn.go:852
#   0xb85cbe    github.com/weaveworks/scope/common/xfer.(*pingingWebsocket).ReadMessage+0xae            /go/src/github.com/wea\
veworks/scope/common/xfer/websocket.go:109
#   0xb86ccc    github.com/weaveworks/scope/common/xfer.(*pipe).CopyToWebsocket.func1+0x3c          /go/src/github.com/weavewo\
rks/scope/common/xfer/pipes.go:120

Tail of logs:

<app> INFO: 2019/12/16 08:24:57.876959 Creating pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:24:57.960942 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:24:58.200017 Get pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:24:58.443435 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:24:58.700782 Get pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:24:58.970222 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:24:59.286143 Deleting pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:03.267505 Get pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:04.841362 Creating pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:04.924074 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:06.006844 Deleting pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:08.756865 Get pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:10.479201 Creating pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:10.552417 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:11.241154 Deleting pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:14.396069 Get pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:14.866810 Creating pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:14.949131 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:15.180797 Get pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:15.272817 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:15.344041 Deleting pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:16.470852 Get pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:17.273131 Creating pipe pipes/14815-pipe-661242507181636894
<app> INFO: 2019/12/16 08:25:17.355450 Release pipes/14815-pipe-661242507181636894:probe
<app> INFO: 2019/12/16 08:25:17.854341 Deleting pipe pipes/14815-pipe-661242507181636894

@bboreham
Copy link
Collaborator Author

bboreham commented May 6, 2020

I tracked this down to the handling of EOF on the non-websocket side of the conversation in the probe.

Can reproduce easily by clicking on the 'describe' button on a Kubernetes object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Excessive resource usage and latency; usually a bug or chore
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants