-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate performance of threaded matrix multiply kernel #248
Comments
Things to look out for
|
An initial profiling result with the test case described above. Using current
My first approach to reduce the inefficiency would be to move the threading to the main loop CONQUEST-release/src/multiply_module.f90 Line 227 in 6bf8f4a
And wrap the MPI communications in |
It should be possible to declare the parallel region in CONQUEST-release/src/multiply_module.f90 Line 226 in 6bf8f4a and keep the !$omp do workshare constructs as orhpaned constructs where they are in the multiply_kernel .
We've tried to implement this in |
ConclusionsPerformance of multiply kernels
Reducing OMP overhead
Longer matrix range
Next stepsNext we need to get rid of the OMP barriers by overlapping communication with computation. This is addressed in #265 |
Once we have closed #195 and #244. We can look into the performance of these threading improvements together with the previously threaded matrix multiply kernels.
The multiply kernel can be selected with the
MULT_KERN
option in theMakefile
. The best place to start isompGemm
, but worth looking at the other options too.A good test case is:
Use
Si.ion
from test 002 in the testsuiteUse
Conquest_input
from test 002 in the testsuite, change Grid cutoff to 200Use Coords.dat from the input used in Thread loops over blocks #195
--> This is the
matrix_multiply
performance test in Add input configurations used for profiling #262Investigate performance of other multiply kernels #268
Think about strategies for reducing omp overhead
Test longer matrix ranges in matrix multiply #269
The text was updated successfully, but these errors were encountered: