Skip to content

giovtorres/slurm-docker-cluster

Folders and files

NameName
Last commit message
Last commit date

Latest commit

9190140 Β· Sep 27, 2024

History

45 Commits
Sep 27, 2024
Mar 5, 2022
Sep 25, 2024
Sep 23, 2024
Sep 25, 2024
Sep 27, 2024
Sep 23, 2022
Sep 11, 2017
Sep 23, 2022
Sep 23, 2022
Dec 24, 2023

Repository files navigation

Slurm Docker Cluster

Slurm Docker Cluster is a multi-container Slurm cluster designed for rapid deployment using Docker Compose. This repository simplifies the process of setting up a robust Slurm environment for development, testing, or lightweight usage.

🏁 Getting Started

To get up and running with Slurm in Docker, make sure you have the following tools installed:

Clone the repository:

git clone https://github.com/giovtorres/slurm-docker-cluster.git
cd slurm-docker-cluster

πŸ“¦ Containers and Volumes

This setup consists of the following containers:

  • mysql: Stores job and cluster data.
  • slurmdbd: Manages the Slurm database.
  • slurmctld: The Slurm controller responsible for job and resource management.
  • c1, c2: Compute nodes (running slurmd).

Persistent Volumes:

  • etc_munge: Mounted to /etc/munge
  • etc_slurm: Mounted to /etc/slurm
  • slurm_jobdir: Mounted to /data
  • var_lib_mysql: Mounted to /var/lib/mysql
  • var_log_slurm: Mounted to /var/log/slurm

πŸ› οΈ Building the Docker Image

The version of the Slurm project and the Docker build process can be simplified by using a .env file, which will be automatically picked up by Docker Compose.

Update the SLURM_TAG and IMAGE_TAG found in the .env file and build the image:

docker compose build

Alternatively, you can build the Slurm Docker image locally by specifying the SLURM_TAG as a build argument and tagging the container with a version (IMAGE_TAG):

docker build --build-arg SLURM_TAG="slurm-21-08-6-1" -t slurm-docker-cluster:21.08.6 .

πŸš€ Starting the Cluster

Once the image is built, deploy the cluster with the default version of slurm using Docker Compose:

docker compose up -d

To specify a specific version and override what is configured in .env, specify the IMAGE_TAG:

IMAGE_TAG=21.08.6 docker compose up -d

This will start up all containers in detached mode. You can monitor their status using:

docker compose ps

πŸ“ Register the Cluster

After the containers are up and running, register the cluster with SlurmDBD:

./register_cluster.sh

Tip: Wait a few seconds for the daemons to initialize before running the registration script to avoid connection errors like: sacctmgr: error: Problem talking to the database: Connection refused.

For real-time cluster logs, use:

docker compose logs -f

πŸ–₯️ Accessing the Cluster

To interact with the Slurm controller, open a shell inside the slurmctld container:

docker exec -it slurmctld bash

Now you can run any Slurm command from inside the container:

[root@slurmctld /]# sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
normal*      up 5-00:00:00      2   idle c[1-2]

πŸ§‘β€πŸ’» Submitting Jobs

The cluster mounts the slurm_jobdir volume across all nodes, making job files accessible from the /data directory. To submit a job:

[root@slurmctld /]# cd /data/
[root@slurmctld data]# sbatch --wrap="hostname"
Submitted batch job 2

Check the output of the job:

[root@slurmctld data]# cat slurm-2.out
c1

πŸ”„ Cluster Management

Stopping and Restarting:

Stop the cluster without removing the containers:

docker compose stop

Restart it later:

docker compose start

Deleting the Cluster:

To completely remove the containers and associated volumes:

docker compose down -v

βš™οΈ Advanced Configuration

You can modify Slurm configurations (slurm.conf, slurmdbd.conf) on the fly without rebuilding the containers. Just run:

./update_slurmfiles.sh slurm.conf slurmdbd.conf
docker compose restart

This makes it easy to add/remove nodes or test new configuration settings dynamically.

🀝 Contributing

Contributions are welcomed from the community! If you want to add features, fix bugs, or improve documentation:

  1. Fork this repo.
  2. Create a new branch: git checkout -b feature/your-feature.
  3. Submit a pull request.

πŸ“„ License

This project is licensed under the MIT License.