GPU Monitoring
Beszel can monitor GPU usage, temperature, and power draw.
AMD GPUs
Work in progress
AMD has deprecated rocm-smi in favor of amd-smi. The agent works with rocm-smi on Linux, but hasn't been updated to work with amd-smi yet.
Beszel uses rocm-smi to monitor AMD GPUs. This must be available on the system, and you must use the binary agent (not the Docker agent).
Make sure rocm-smi is accessible
Installing rocm-smi-lib on Arch and Debian places the rocm-smi binary in /opt/rocm. If this isn't in the PATH of the user running beszel-agent, symlink to /usr/local/bin:
sudo ln -s /opt/rocm/bin/rocm-smi /usr/local/bin/rocm-smiNvidia GPUs
Docker agent
Make sure NVIDIA Container Toolkit is installed on the host system.
Use henrygd/beszel-agent-nvidia and add the following deploy block to your docker-compose.yml.
beszel-agent:
image: henrygd/beszel-agent-nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- utilityBinary agent
You must have nvidia-smi available on the system.
If it doesn't work, you may need to allow access to your devices in the service configuration. See discussion #563 for more information.
[Service]
DeviceAllow=/dev/nvidiactl rw
DeviceAllow=/dev/nvidia0 rw
# If you have multiple GPUs, make sure to allow all of them
DeviceAllow=/dev/nvidia1 rw
DeviceAllow=/dev/nvidia2 rwsystemctl daemon-reload
systemctl restart beszel-agentNvidia Jetson
You must use the binary agent and have tegrastats installed.
The henrygd/beszel-agent-nvidia image likely doesn't work, but I can't test it to confirm. Let me know one way or the other if you try it 😃.
Intel GPUs
Support for Intel is new and wrinkles are still being ironed out.
Note that only one GPU per system is supported. We may add support for multiple GPUs in the future.
Docker agent
Use the henrygd/beszel-agent-intel image with the additional options below.
beszel-agent:
image: henrygd/beszel-agent-intel
cap_add:
- CAP_PERFMON
devices:
- /dev/dri/card0:/dev/dri/card0Use ls /dev/dri to find the name of your GPU:
ls /dev/driby-path card0 renderD128You may need to set a lower value for the perf_event_paranoid kernel parameter. See issue #1150 or #1203 for more information.
sudo sysctl kernel.perf_event_paranoid=2If none of the above works, try adding CAP_SYS_ADMIN and CAP_DAC_OVERRIDE in addition to CAP_PERFMON.
Binary agent
You must have intel_gpu_top installed. This is typically part of the intel-gpu-tools package.
sudo apt install intel-gpu-toolssudo pacman -S intel-gpu-toolsAssuming you're not running the agent as root, you'll need to set the cap_perfmon capability on the intel_gpu_top binary.
sudo setcap cap_perfmon=ep /usr/bin/intel_gpu_topIf that doesn't work, you may need to set a lower value for the perf_event_paranoid kernel parameter. See issue #1150 or #1203 for more information.
sudo sysctl kernel.perf_event_paranoid=2To make this change persistant across reboots you need to add it to the sysctl configuration
echo "kernel.perf_event_paranoid=2" | sudo tee /etc/sysctl.d/99-intel-gpu-beszel.conf
sudo sysctl --system