Skip to content

ferrophile/gpu-monitor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gpu-monitor

A Python tool for polling remote GPU servers.

This tool continuously checks if GPUs are available on the specified server, using nvidia-smi. When the specified requirements are met, the tool stops polling and generates a notification.

Requirements

pip install -r requirements.txt
sudo apt install ffmpeg

Getting started

Basic usage

Notify when 4 fully idle GPUs are available:

python gpu-monitor.py -u <user@host> --min_gpus 4
// OR
python gpu-monitor.py -u <user> -d <host> --min_gpus 4

If you do not want to be prompted for the password evertime, you can specify an authorized SSH key (typically ~/.ssh/id_rsa.pub):

python gpu-monitor.py -u <user@host> -k <path/to/pubkey> --min_gpus 4

Notify when there are 4 GPUs with at least 5000 MiB available RAM:

python gpu-monitor.py -u <user@host> --min_gpus 4 --min_ram 5000

Check server every 30 seconds:

python gpu-monitor.py -u <user@host> --step 30 --min_gpus 4

An explanation of each parameter is provided using python gpu-monitor.py -h.

Notifications

When the specified requirement is met, a notification will be issued. Below is an example:

Additionally, you can specify a sound file to play when GPU is available:

python gpu-monitor.py -u <user@host> --min_gpus 4 --alert_sound <path/to/sound/file>

Run the following to see if the notification is working properly:

python gpu-monitor.py --alert_sound <path/to/sound/file> --debug

About

GPU monitor tool for remote servers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages