perf(probe): publish delta reports to reduce data size #3677

bboreham · 2019-09-15T16:34:32Z

Similar to video compression which uses key-frames and differences between them: every N publishes we send a full report, but inbetween we only send what has changed.

Fairly simple approach in the probe - hold on to the last full report, and for the deltas remove anything that would be merged in from the full report.

On the receiving side in the app it already merges a set of reports together to produce the final output for rendering, so provided N is smaller than that set we don't need to do anything different.

Deltas don't need to represent nodes that have disappeared - an earlier full node will have that node so it would be merged into the final output anyway.

Results from a small test showed around 40% drop in bytes/second sent by probes - process and container metrics are still updated on every publish.
CPU usage in the app dropped about 35%.
Probes memory and CPU was about the same as before - most of the effort goes into collecting the data not publishing it.

fbarl

Thanks for taking some time to write this PR, great idea!

Nice to haves before you merge:

Tests as this PR might potentially break some core behavior - perhaps just one report-level UnsafeUnMerge test with bigger data would do though
See if we could unmerge previous reports instead of last full reports
See if we could replace EqualUnderMerge with some sort of ContainedIn criterion across different data structures

Otherwise looks great! 💯

probe/probe.go

fbarl · 2019-09-16T09:12:35Z

prog/main.go

@@ -297,6 +298,7 @@ func setupFlags(flags *flags) {
 	flag.StringVar(&flags.probe.httpListen, "probe.http.listen", "", "listen address for HTTP profiling and instrumentation server")
 	flag.DurationVar(&flags.probe.publishInterval, "probe.publish.interval", 3*time.Second, "publish (output) interval")
 	flag.DurationVar(&flags.probe.spyInterval, "probe.spy.interval", time.Second, "spy (scan) interval")
+	flag.IntVar(&flags.probe.ticksPerFullReport, "probe.full-report-every", 3, "publish full report every N times, deltas in between")


I imagine 3 is a conservative start to see how it works and then we'd increase it over time?

Yes, but with a 15 second window and 3-second publish interval we can't go past 4.

Yes, but with a 15 second window and 3-second publish interval we can't go past 4.

Do you think this is something worth stressing out it the arg description? e.g.

Note: Make sure that N < ui.poll.interval / probe.publish.interval for reporting to work properly.

By the way, that makes me think that there might be some value in being able to pass ui.poll.interval as an actual argument to the UI :)

It's -app.window that is important here.

report/latest_map_generated.go

report/node.go

report/report.go

report/latest_map_generated.go

Similar to video compression which uses key-frames and differences between them: every N publishes we send a full report, but inbetween we only send what has changed. Fairly simple approach in the probe - hold on to the last full report, and for the deltas remove anything that would be merged in from the full report. On the receiving side in the app it already merges a set of reports together to produce the final output for rendering, so provided N is smaller than that set we don't need to do anything different. Deltas don't need to represent nodes that have disappeared - an earlier full node will have that node so it would be merged into the final output anyway.

Primarily to help when writing tests; may give a tiny performance benefit.

Testing the new delta-report internals

bboreham · 2019-09-18T08:03:07Z

I have rebased, added comments as discussed, and a test.

fbarl

I have rebased, added comments as discussed, and a test.

Thanks @bboreham! It all looks good to me now, I just left one more comment about documenting the constraint on probe.full-report.every that would be nice to have before hitting the merge button.

bboreham requested a review from qiell as a code owner September 15, 2019 16:34

bboreham force-pushed the delta-reports branch from 8858933 to 82bb24c Compare September 15, 2019 16:36

bboreham mentioned this pull request Sep 15, 2019

perf(probe): add 'omitempty' tag to Topology.Nodes #3678

Merged

qiell requested a review from fbarl September 16, 2019 07:53

fbarl reviewed Sep 16, 2019

View reviewed changes

bboreham commented Sep 16, 2019

View reviewed changes

report/latest_map_generated.go Outdated Show resolved Hide resolved

bboreham force-pushed the delta-reports branch from 82bb24c to f65a87a Compare September 17, 2019 16:58

bboreham requested a review from satyamz as a code owner September 17, 2019 16:58

bboreham added 4 commits September 18, 2019 08:00

Refactor: pull Publish() call up to publishLoop()

eff5a1f

chore: allow Report.DNS field to be nil

951629a

Primarily to help when writing tests; may give a tiny performance benefit.

test: add TestReportUnMerge()

da030d1

Testing the new delta-report internals

bboreham force-pushed the delta-reports branch from f65a87a to da030d1 Compare September 18, 2019 08:01

fbarl approved these changes Sep 18, 2019

View reviewed changes

help: add note on constraint to -full-report-every argument

395282b

bboreham merged commit e9f0f90 into master Sep 18, 2019

bboreham deleted the delta-reports branch September 18, 2019 11:10

This was referenced Oct 13, 2019

Send incremental reports probe->app #985

Closed

Replace Scope metrics timeseries with Prometheus metrics #3710

Open

Fix accidental rename of DNS to 'nodes' #3713

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(probe): publish delta reports to reduce data size #3677

perf(probe): publish delta reports to reduce data size #3677

bboreham commented Sep 15, 2019

fbarl left a comment

fbarl Sep 16, 2019

bboreham Sep 16, 2019

fbarl Sep 18, 2019

bboreham Sep 18, 2019

bboreham commented Sep 18, 2019

fbarl left a comment

perf(probe): publish delta reports to reduce data size #3677

perf(probe): publish delta reports to reduce data size #3677

Conversation

bboreham commented Sep 15, 2019

fbarl left a comment

Choose a reason for hiding this comment

fbarl Sep 16, 2019

Choose a reason for hiding this comment

bboreham Sep 16, 2019

Choose a reason for hiding this comment

fbarl Sep 18, 2019

Choose a reason for hiding this comment

bboreham Sep 18, 2019

Choose a reason for hiding this comment

bboreham commented Sep 18, 2019

fbarl left a comment

Choose a reason for hiding this comment