-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(probe): publish delta reports to reduce data size #3677
Conversation
8858933
to
82bb24c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking some time to write this PR, great idea!
Nice to haves before you merge:
- Tests as this PR might potentially break some core behavior - perhaps just one report-level
UnsafeUnMerge
test with bigger data would do though - See if we could unmerge previous reports instead of last full reports
- See if we could replace
EqualUnderMerge
with some sort ofContainedIn
criterion across different data structures
Otherwise looks great! 💯
prog/main.go
Outdated
@@ -297,6 +298,7 @@ func setupFlags(flags *flags) { | |||
flag.StringVar(&flags.probe.httpListen, "probe.http.listen", "", "listen address for HTTP profiling and instrumentation server") | |||
flag.DurationVar(&flags.probe.publishInterval, "probe.publish.interval", 3*time.Second, "publish (output) interval") | |||
flag.DurationVar(&flags.probe.spyInterval, "probe.spy.interval", time.Second, "spy (scan) interval") | |||
flag.IntVar(&flags.probe.ticksPerFullReport, "probe.full-report-every", 3, "publish full report every N times, deltas in between") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine 3
is a conservative start to see how it works and then we'd increase it over time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but with a 15 second window and 3-second publish interval we can't go past 4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but with a 15 second window and 3-second publish interval we can't go past 4.
Do you think this is something worth stressing out it the arg description? e.g.
Note: Make sure that N < ui.poll.interval / probe.publish.interval for reporting to work properly.
By the way, that makes me think that there might be some value in being able to pass ui.poll.interval
as an actual argument to the UI :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's -app.window
that is important here.
82bb24c
to
f65a87a
Compare
Similar to video compression which uses key-frames and differences between them: every N publishes we send a full report, but inbetween we only send what has changed. Fairly simple approach in the probe - hold on to the last full report, and for the deltas remove anything that would be merged in from the full report. On the receiving side in the app it already merges a set of reports together to produce the final output for rendering, so provided N is smaller than that set we don't need to do anything different. Deltas don't need to represent nodes that have disappeared - an earlier full node will have that node so it would be merged into the final output anyway.
Primarily to help when writing tests; may give a tiny performance benefit.
Testing the new delta-report internals
f65a87a
to
da030d1
Compare
I have rebased, added comments as discussed, and a test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have rebased, added comments as discussed, and a test.
Thanks @bboreham! It all looks good to me now, I just left one more comment about documenting the constraint on probe.full-report.every
that would be nice to have before hitting the merge button.
Similar to video compression which uses key-frames and differences between them: every N publishes we send a full report, but inbetween we only send what has changed.
Fairly simple approach in the probe - hold on to the last full report, and for the deltas remove anything that would be merged in from the full report.
On the receiving side in the app it already merges a set of reports together to produce the final output for rendering, so provided N is smaller than that set we don't need to do anything different.
Deltas don't need to represent nodes that have disappeared - an earlier full node will have that node so it would be merged into the final output anyway.
Results from a small test showed around 40% drop in bytes/second sent by probes - process and container metrics are still updated on every publish.
CPU usage in the app dropped about 35%.
Probes memory and CPU was about the same as before - most of the effort goes into collecting the data not publishing it.