GPU Monitoring
Beszel can monitor GPU usage, temperature, and power draw.
AMD GPUs
Work in progress
AMD has deprecated rocm-smi in favor of amd-smi. The agent works with rocm-smi on Linux, but hasn't been updated to work with amd-smi yet.
Beszel uses rocm-smi to monitor AMD GPUs. This must be available on the system, and you must use the binary agent (not the Docker agent).
Make sure rocm-smi is accessible
Installing rocm-smi-lib on Arch and Debian places the rocm-smi binary in /opt/rocm. If this isn't in the PATH of the user running beszel-agent, symlink to /usr/local/bin:
sudo ln -s /opt/rocm/bin/rocm-smi /usr/local/bin/rocm-smiNvidia GPUs
Power usage warning
nvidia-smi prevents GPUs from entering RTD3 power saving mode, which may cause increased power consumption on laptops.
We are working on a solution. See issue #1522 for more information.
Docker agent
Make sure NVIDIA Container Toolkit is installed on the host system.
Use henrygd/beszel-agent-nvidia and add the following deploy block to your docker-compose.yml.
beszel-agent:
image: henrygd/beszel-agent-nvidia
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities:
- utilityBinary agent
You must have nvidia-smi available on the system.
If it doesn't work, you may need to allow access to your devices in the service configuration. See discussion #563 for more information.
[Service]
DeviceAllow=/dev/nvidiactl rw
DeviceAllow=/dev/nvidia0 rw
# If you have multiple GPUs, make sure to allow all of them
DeviceAllow=/dev/nvidia1 rw
DeviceAllow=/dev/nvidia2 rwsystemctl daemon-reload
systemctl restart beszel-agentNvidia Jetson
The binary agent should work automatically with no additional configuration.
Docker agent
The Docker agent requires a custom image and a bind mount for tegrastats.
1. Create a custom Dockerfile
Create a Dockerfile in the same directory as your docker-compose.yml:
FROM frolvlad/alpine-glibc:latest
COPY --from=henrygd/beszel-agent:latest /agent /agent
RUN chmod +x /agent
ENTRYPOINT ["/agent"]2. Update Docker Compose
Update your docker-compose.yml to use your custom image, and bind mount tegrastats:
beszel-agent:
image: henrygd/beszel-agent
build: .
volumes:
- /usr/bin/tegrastats:/usr/bin/tegrastats:roSee discussion #1600 for more information.
Intel GPUs
Note that only one GPU per system is supported. We may add support for multiple GPUs in the future.
Docker agent
Use the henrygd/beszel-agent-intel image with the additional options below.
beszel-agent:
image: henrygd/beszel-agent-intel
cap_add:
- CAP_PERFMON
devices:
- /dev/dri/card0:/dev/dri/card0Use ls /dev/dri to find the name of your GPU:
ls /dev/driby-path card0 renderD128Binary agent
You must have intel_gpu_top installed. This is typically part of the intel-gpu-tools package.
sudo apt install intel-gpu-toolssudo pacman -S intel-gpu-toolsAssuming you're not running the agent as root, you'll need to set the cap_perfmon capability on the intel_gpu_top binary.
sudo setcap cap_perfmon=ep /usr/bin/intel_gpu_topIf running the agent as a systemd service, add the CAP_PERFMON ambient capability to the beszel-agent service so that non-root services can still access performance counters:
[Service]
AmbientCapabilities=CAP_PERFMONThis is required because file-based capabilities set with setcap on intel_gpu_top are not inherited by child processes when the service is run as a non-root user. See issue #1480 for additional context.
Troubleshooting
To independently test the intel_gpu_top command:
# docker
docker exec -it beszel-agent intel_gpu_top -s 3000 -l
# binary
sudo -u beszel intel_gpu_top -s 3000 -lSpecify the device name
On some systems you need to specify the device name for intel_gpu_top. Use the INTEL_GPU_DEVICE environment variable to set the -d value.
INTEL_GPU_DEVICE=drm:/dev/dri/card0This is equivalent to running intel_gpu_top -s 3000 -l -d drm:/dev/dri/card0.
Lower the perf_event_paranoid kernel parameter
You may need to lower the value for the perf_event_paranoid kernel parameter. See issue #1150 or #1203 for more information.
sudo sysctl kernel.perf_event_paranoid=2To make this change persistant across reboots you need to add it to the sysctl configuration
echo "kernel.perf_event_paranoid=2" | sudo tee /etc/sysctl.d/99-intel-gpu-beszel.conf
sudo sysctl --system