A simple example of running an MPI program on Kubernetes. Contains example program for:
- Word count of NOW corpus
- Average rating calculation of each Netflix movies
See also: Implementation in Hadoop Docker
To modify the amount of workers, change the replicas
value in kubernetes/statefulset.yaml
. Default is set to 4 (including the master).
# Check if docker image is available
docker pull ynshung/mpi-docker:latest
# Generate ssh keys (optional)
./generate-ssh-secret.sh
# Apply all Kubernetes configurations
kubectl apply -f kubernetes/
# Check the status of the pods
kubectl get pods
# Make sure all pods are running and the copied IP addresses for next command are only from mpi-workers-N
# Get the IP addresses of the pods, copy them temporarily
kubectl get pods -o custom-columns="IP:.status.podIP" --no-headers
# Enter the master pod
kubectl exec -it mpi-worker-0 -- bash
# Create a hostfile
echo "<paste copied ip addresses here>" > hostfile
# (Optional) You may remove the master IP address from the hostfile
The script counts the number of each words in a directory of text file containing a sample of NoW corpus and saves the results to a CSV file while tracking execution time.
# Run the MPI program for word count (with 3 workers)
mpirun --allow-run-as-root \
--hostfile hostfile \
-np 3 python3 /app/word_count_mpi.py
# Run word count without MPI
python3 /app/word_count.py
The script calculates and sorts Netflix movie ratings by average rating, saving the results to a CSV file while tracking execution time.
# Run the MPI program for Netflix ratings (with 3 workers)
mpirun --allow-run-as-root \
--hostfile hostfile \
-np 3 python3 /app/netflix_mpi.py
# Run calculation without MPI
python3 /app/netflix.py
The results are saved in the output
directory in the master pod. You can copy the results to your local machine with the following command:
kubectl cp -n default mpi-worker-0:/app/output output