-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenMPI runtime tuning (rankfile) #184
Comments
In addition when i monitor the usage using htop in case of BAGEL a lot more cores display usage while in case of mpirun -np 1 BAGEL only 1 core seems active |
We wrote in the manual that we strongly discourage use of openmpi - at least in the past openmpi has had bugs or issues that are related to threading. Please use Intel’MPI instead (it’s free - or mvapich, though it sometimes requires some careful settings with MKL’s threading). I have not observed such behavior.
… On Oct 10, 2019, at 6:36 AM, VishalKJ ***@***.***> wrote:
In addition when i monitor the usage using htop in case of BAGEL a lot more cores display usage while in case of mpirun -np 1 BAGEL only 1 core seems active
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Thanks for your reply Dr. Shiozaki. However we managed to resolve the issue. I document so that future readers benefit If the program is run by 'mpirun -np 1 BAGEL ' OpenMPI only reserves 1 core for the MPI process. This subsequently leads to overbooking of this core with BAGEL_NUM_THREADS number of threads. The problem can be alleviated by using rankfiles whcih specify how to book slots for MPI processes. For example if i want to run just one MPI process using the hyperthreading functioality to fully use 56 threads. numactl -H gives me the layout: So we build our rankfile as folllows: In this rankfile we have booked all the cores on both sockets. Thus our mpirun command has now access to all the physical core. Now if we specify BAGEL_NUM_THREDAS/MKL_NUM_THREADS=56 , 56 threads are launched for this mpi process , thus fully taking advantage of all hyperthredaded cores. We can run this by: |
To run two mpi processes with one on each socket/numa-node the corresponding rankfile will be run by |
Thanks - good to know that worked out for you. Will leave this open so others may see it. |
dear developers,
I am observing quite different times for a sample input 'casscf+xmspt2' when running bagel in parallel using just BAGEL <input_file> or mpirun -np 1 <input_file> . The node I am running on has two sockets with 14 cores on each socket with hypertherading enabled (total 56 cores reported by lscpu) . Using the aforementioned methods of running in both cases the output reports:
But in case of using without mpirun (i.e. just BAGEL) the times of {MOLECULE,CASSCF,SMITH} are {0.29,9.88,41.77} while if the program is run as mpirun -np 1 BAGEL the times are {1.65,35.14,38.81} . These increases/variability in times of MOLECULE and especially CASSCF section are consistent across multiple runs. Is this expected behaviour ? In addition what is correct way to run BAGL for maximum parallel performance ?
BAGEL compiled with
GCC-8.3.1/MKL/OPENMPI-4.0.1 CFLAGS=-DNDEBUG -O3 -mavx2 with boost_1.71.0
The text was updated successfully, but these errors were encountered: