Skip to content

GPU Monitoring

Beszel can monitor GPU usage, temperature, and power draw.

AMD GPUs

Work in progress

AMD has deprecated rocm-smi in favor of amd-smi. The agent works with rocm-smi on Linux, but hasn't been updated to work with amd-smi yet.

Beszel uses rocm-smi to monitor AMD GPUs. This must be available on the system, and you must use the binary agent (not the Docker agent).

Make sure rocm-smi is accessible

Installing rocm-smi-lib on Arch and Debian places the rocm-smi binary in /opt/rocm. If this isn't in the PATH of the user running beszel-agent, symlink to /usr/local/bin:

bash
sudo ln -s /opt/rocm/bin/rocm-smi /usr/local/bin/rocm-smi

Nvidia GPUs

Power usage warning

nvidia-smi prevents GPUs from entering RTD3 power saving mode, which may cause increased power consumption on laptops.

We are working on a solution. See issue #1522 for more information.

Docker agent

Make sure NVIDIA Container Toolkit is installed on the host system.

Use henrygd/beszel-agent-nvidia and add the following deploy block to your docker-compose.yml.

yaml
beszel-agent:
  image: henrygd/beszel-agent-nvidia
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all
            capabilities:
              - utility

Binary agent

You must have nvidia-smi available on the system.

If it doesn't work, you may need to allow access to your devices in the service configuration. See discussion #563 for more information.

ini
[Service]
DeviceAllow=/dev/nvidiactl rw
DeviceAllow=/dev/nvidia0 rw
# If you have multiple GPUs, make sure to allow all of them
DeviceAllow=/dev/nvidia1 rw
DeviceAllow=/dev/nvidia2 rw
bash
systemctl daemon-reload
systemctl restart beszel-agent

Nvidia Jetson

The binary agent should work automatically with no additional configuration.

Docker agent

The Docker agent requires a custom image and a bind mount for tegrastats.

1. Create a custom Dockerfile

Create a Dockerfile in the same directory as your docker-compose.yml:

dockerfile
FROM frolvlad/alpine-glibc:latest

COPY --from=henrygd/beszel-agent:latest /agent /agent
RUN chmod +x /agent

ENTRYPOINT ["/agent"]

2. Update Docker Compose

Update your docker-compose.yml to use your custom image, and bind mount tegrastats:

yaml
beszel-agent:
  image: henrygd/beszel-agent
  build: .
  volumes:
    - /usr/bin/tegrastats:/usr/bin/tegrastats:ro

See discussion #1600 for more information.

Intel GPUs

Note that only one GPU per system is supported. We may add support for multiple GPUs in the future.

Docker agent

Use the henrygd/beszel-agent-intel image with the additional options below.

yaml
beszel-agent:
  image: henrygd/beszel-agent-intel
  cap_add:
    - CAP_PERFMON
  devices:
    - /dev/dri/card0:/dev/dri/card0

Use ls /dev/dri to find the name of your GPU:

bash
ls /dev/dri
by-path  card0  renderD128

Binary agent

You must have intel_gpu_top installed. This is typically part of the intel-gpu-tools package.

bash
sudo apt install intel-gpu-tools
bash
sudo pacman -S intel-gpu-tools

Assuming you're not running the agent as root, you'll need to set the cap_perfmon capability on the intel_gpu_top binary.

bash
sudo setcap cap_perfmon=ep /usr/bin/intel_gpu_top

If running the agent as a systemd service, add the CAP_PERFMON ambient capability to the beszel-agent service so that non-root services can still access performance counters:

ini
[Service]
AmbientCapabilities=CAP_PERFMON

This is required because file-based capabilities set with setcap on intel_gpu_top are not inherited by child processes when the service is run as a non-root user. See issue #1480 for additional context.

Troubleshooting

To independently test the intel_gpu_top command:

bash
# docker
docker exec -it beszel-agent intel_gpu_top -s 3000 -l
# binary
sudo -u beszel intel_gpu_top -s 3000 -l

Specify the device name

On some systems you need to specify the device name for intel_gpu_top. Use the INTEL_GPU_DEVICE environment variable to set the -d value.

dotenv
INTEL_GPU_DEVICE=drm:/dev/dri/card0

This is equivalent to running intel_gpu_top -s 3000 -l -d drm:/dev/dri/card0.

Lower the perf_event_paranoid kernel parameter

You may need to lower the value for the perf_event_paranoid kernel parameter. See issue #1150 or #1203 for more information.

bash
sudo sysctl kernel.perf_event_paranoid=2

To make this change persistant across reboots you need to add it to the sysctl configuration

bash
echo "kernel.perf_event_paranoid=2" | sudo tee /etc/sysctl.d/99-intel-gpu-beszel.conf
sudo sysctl --system

Released under the MIT License