Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Faster aggregation for the MetricsEventSource under concurrency #71563

Open
Tracked by #97522
noahfalk opened this issue Jul 1, 2022 · 0 comments
Open
Tracked by #97522

Comments

@noahfalk
Copy link
Member

noahfalk commented Jul 1, 2022

Currently the MetricsEventSource uses locks or interlocked instructions to synchronize against multiple threads recording data points and this introduces contention. The goal is to eliminate the overhead of this contention as much as possible, ideally to the point that multi-threaded and single-threaded scenarios look identical. Its likely that reducing contention will require increasing memory footprint, decreasing measurement precision, or both. Depending on the difference in behavior we can decide whether it is acceptable to make it the new default or if we need settings that folks opt into.

This issue shows the problem for counters. Some work was done in .NET 8 to improve it, but I think we are still wasting 7/8ths of each cache line so there is definite room to do better within the same memory footprint.

For histograms quick benchmarking with 10 threads each issuing 100M updates shows this performance:

no listener - 1 sec
dotnet-counters listening - 40 sec
dotnet-counters listening + histogram dimension added with current thread id - 6 secs

Adding a thread id dimension causes the aggregator to track unique histograms per thread which eliminates the lock contention revealing the sizable perf overhead. I'm thinking to resolve this we should move to per-thread histogram buckets, but we probably need some compensating changes to ensure memory usage doesn't go really high. Right now histograms are using a minimum of 32KB each for first level bucketing array and then I think each 2nd level array 2^14 entries (64KB) to maintain a < 1:10,000 error bound. We could split the first level 4096 entries into two levels of 64 entries each and lower the default error bounds to something like 1:100 which should bring the footprint down to a few KB per thread.

@noahfalk noahfalk added this to the 8.0.0 milestone Jul 1, 2022
@tommcdon tommcdon modified the milestones: 8.0.0, 9.0.0 Jul 19, 2023
@noahfalk noahfalk changed the title [Feature Request] Faster histogram aggregation for the MetricsEventSource [Feature Request] Faster aggregation for the MetricsEventSource under concurrency Jan 11, 2024
@tarekgh tarekgh modified the milestones: 9.0.0, 10.0.0 Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants