GPU Monitoring
Beszel can monitor GPU usage, temperature, and power draw.
AMD GPUs
Work in progress
AMD has deprecated rocm-smi
in favor of amd-smi
. The agent works with rocm-smi
on Linux, but hasn't been updated to work with amd-smi
yet.
Beszel uses rocm-smi
to monitor AMD GPUs. This must be available on the system, and you must use the binary agent (not the Docker agent).
Make sure rocm-smi
is accessible
Installing rocm-smi-lib
on Arch and Debian places the rocm-smi
binary in /opt/rocm
. If this isn't in the PATH
of the user running beszel-agent
, symlink to /usr/local/bin
:
sudo ln -s /opt/rocm/bin/rocm-smi /usr/local/bin/rocm-smi
Nvidia GPUs
Docker agent
Make sure NVIDIA Container Toolkit is installed on the host system.
Use henrygd/beszel-agent-nvidia
and add the following deploy
block to your docker-compose.yml
.
beszel-agent:
image: henrygd/beszel-agent-nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- utility
Binary agent
You must have nvidia-smi
available on the system.
If it doesn't work, you may need to allow access to your devices in the service configuration. See discussion #563 for more information.
[Service]
DeviceAllow=/dev/nvidiactl rw
DeviceAllow=/dev/nvidia0 rw
# If you have multiple GPUs, make sure to allow all of them
DeviceAllow=/dev/nvidia1 rw
DeviceAllow=/dev/nvidia2 rw
systemctl daemon-reload
systemctl restart beszel-agent
Nvidia Jetson
You must use the binary agent and have tegrastats
installed.
The henrygd/beszel-agent-nvidia
image likely doesn't work, but I can't test it to confirm. Let me know one way or the other if you try it 😃.
Intel GPUs
Support for Intel is new and wrinkles are still being ironed out.
Note that only one GPU per system is supported. We may add support for multiple GPUs in the future.
Docker agent
Use the henrygd/beszel-agent-intel
image with the additional options below.
beszel-agent:
image: henrygd/beszel-agent-intel
cap_add:
- CAP_PERFMON
devices:
- /dev/dri/card0:/dev/dri/card0
Use ls /dev/dri
to find the name of your GPU:
ls /dev/dri
by-path card0 renderD128
You may need to set a lower value for the perf_event_paranoid
kernel parameter. See issue #1150 or #1203 for more information.
sudo sysctl kernel.perf_event_paranoid=2
If none of the above works, try adding CAP_SYS_ADMIN
and CAP_DAC_OVERRIDE
in addition to CAP_PERFMON
.
Binary agent
You must have intel_gpu_top
installed. This is typically part of the intel-gpu-tools
package.
sudo apt install intel-gpu-tools
sudo pacman -S intel-gpu-tools
Assuming you're not running the agent as root, you'll need to set the cap_perfmon
capability on the intel_gpu_top
binary.
sudo setcap cap_perfmon=ep /usr/bin/intel_gpu_top
If that doesn't work, you may need to set a lower value for the perf_event_paranoid
kernel parameter. See issue #1150 or #1203 for more information.
sudo sysctl kernel.perf_event_paranoid=2
To make this change persistant across reboots you need to add it to the sysctl
configuration
echo "kernel.perf_event_paranoid=2" | sudo tee /etc/sysctl.d/99-intel-gpu-beszel.conf
sudo sysctl --system