Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A small performance optimization #10

Closed
sumerc opened this issue Dec 9, 2020 · 3 comments
Closed

A small performance optimization #10

sumerc opened this issue Dec 9, 2020 · 3 comments

Comments

@sumerc
Copy link

sumerc commented Dec 9, 2020

Hey Felix,

I was playing with runtime package/profilers a bit for learning and while trying out different things, I happen to see following code is used to retrieve all Goroutines:

for {
	n, ok := runtime.GoroutineProfile(p.stacks)
	if !ok {
		p.stacks = make([]runtime.StackRecord, int(float64(n)*1.1))
	} else {
		return p.stacks[0:n]
	}
}

However it turns out following is a bit faster as runtime.NumGoroutine() just returns len(gcount()) whereas GoroutineProfile seems to do bunch of other things.

p.stacks = make([]runtime.StackRecord, int(float64(runtime.NumGoroutine())*1.1))
for {
   ...
}

I have benchmarked following function:

func GoroutineProfile(useNumG bool) {
	var stacks []runtime.StackRecord

	if useNumG {
		nGoroutines := runtime.NumGoroutine()
		stacks = make([]runtime.StackRecord, int(float64(nGoroutines)*1.1))
	}

	for {
		n, ok := runtime.GoroutineProfile(stacks)
		if !ok {
                        // this can happen regardless of our approach
			stacks = make([]runtime.StackRecord, int(float64(n)*1.1))
		} else {
			break
		}
	}
}

And the results are:

go test -bench=. -benchmem                                                                                                   ~/Desktop/p/go-tryouts  
goos: linux
goarch: amd64
BenchmarkGoroutineProfile/Using_runtime.NumGoroutine()-12                 111543              9944 ns/op             768 B/op          1 allocs/op
BenchmarkGoroutineProfile/Not_using_runtime.NumGoroutine()-12              66451             16140 ns/op             768 B/op          1 allocs/op

While not too significant, it is not bad for a single line of change.

Would you be interested for a PR for this? Is there any other problem with the approach?

@felixge
Copy link
Owner

felixge commented Dec 9, 2020

Hey @sumerc thank you so much for taking an interest in this and reaching out. I love geeking out about this stuff : ).

As far as your change is concerned, I'd love to see a PR for it. I can totally see how it's faster for the function you've shown, but the code inside of fgprof is slightly different. The profiler struct is written in a way that the costs of discovering the ideal size of the stacks slice is amortized by only doing it once in the beginning. So your optimization would only help with the very first profiler.GoroutineProfile() call.

Anyway, if it's a small change I'd still be happy to merge it. And I'm also happy to just discuss ideas on how to make things faster in general!

@sumerc
Copy link
Author

sumerc commented Dec 9, 2020

Anyway, if it's a small change I'd still be happy to merge it. And I'm also happy to just discuss ideas on how to make things faster in general!

Me too! Thanks for such inclusive/positive comment :)

The profiler struct is written in a way that the costs of discovering the ideal size of the stacks slice is amortized by only doing it once in the beginning

I understood. p.stacks is only initialized once. Well, I also confirmed you are right. The change did not change(too much) the output go test -bench=BenchmarkProfilerGoroutines but in anycase, I will submit the PR. Up to you to accept/reject :)

Thanks again!

@felixge
Copy link
Owner

felixge commented Aug 27, 2022

Closing this since #11 was closed.

@felixge felixge closed this as completed Aug 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants