Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(probe): publish delta reports to reduce data size #3677

Merged
merged 5 commits into from
Sep 18, 2019
Merged

Conversation

bboreham
Copy link
Collaborator

Similar to video compression which uses key-frames and differences between them: every N publishes we send a full report, but inbetween we only send what has changed.

Fairly simple approach in the probe - hold on to the last full report, and for the deltas remove anything that would be merged in from the full report.

On the receiving side in the app it already merges a set of reports together to produce the final output for rendering, so provided N is smaller than that set we don't need to do anything different.

Deltas don't need to represent nodes that have disappeared - an earlier full node will have that node so it would be merged into the final output anyway.

Results from a small test showed around 40% drop in bytes/second sent by probes - process and container metrics are still updated on every publish.
CPU usage in the app dropped about 35%.
Probes memory and CPU was about the same as before - most of the effort goes into collecting the data not publishing it.

Copy link
Contributor

@fbarl fbarl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking some time to write this PR, great idea!

Nice to haves before you merge:

  • Tests as this PR might potentially break some core behavior - perhaps just one report-level UnsafeUnMerge test with bigger data would do though
  • See if we could unmerge previous reports instead of last full reports
  • See if we could replace EqualUnderMerge with some sort of ContainedIn criterion across different data structures

Otherwise looks great! 💯

prog/main.go Outdated
@@ -297,6 +298,7 @@ func setupFlags(flags *flags) {
flag.StringVar(&flags.probe.httpListen, "probe.http.listen", "", "listen address for HTTP profiling and instrumentation server")
flag.DurationVar(&flags.probe.publishInterval, "probe.publish.interval", 3*time.Second, "publish (output) interval")
flag.DurationVar(&flags.probe.spyInterval, "probe.spy.interval", time.Second, "spy (scan) interval")
flag.IntVar(&flags.probe.ticksPerFullReport, "probe.full-report-every", 3, "publish full report every N times, deltas in between")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine 3 is a conservative start to see how it works and then we'd increase it over time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but with a 15 second window and 3-second publish interval we can't go past 4.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but with a 15 second window and 3-second publish interval we can't go past 4.

Do you think this is something worth stressing out it the arg description? e.g.

Note: Make sure that N < ui.poll.interval / probe.publish.interval for reporting to work properly.

By the way, that makes me think that there might be some value in being able to pass ui.poll.interval as an actual argument to the UI :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's -app.window that is important here.

Similar to video compression which uses key-frames and differences
between them: every N publishes we send a full report, but inbetween
we only send what has changed.

Fairly simple approach in the probe - hold on to the last full report,
and for the deltas remove anything that would be merged in from the
full report.

On the receiving side in the app it already merges a set of reports
together to produce the final output for rendering, so provided N is
smaller than that set we don't need to do anything different.

Deltas don't need to represent nodes that have disappeared - an
earlier full node will have that node so it would be merged into the
final output anyway.
Primarily to help when writing tests; may give a tiny performance benefit.
Testing the new delta-report internals
@bboreham
Copy link
Collaborator Author

I have rebased, added comments as discussed, and a test.

Copy link
Contributor

@fbarl fbarl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have rebased, added comments as discussed, and a test.

Thanks @bboreham! It all looks good to me now, I just left one more comment about documenting the constraint on probe.full-report.every that would be nice to have before hitting the merge button.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants