This content originally appeared on DEV Community and was authored by Panos S
My ML group has a few GPU servers. I wanted to check utilization without SSHing into each machine. The standard answer is Grafana + Prometheus + exporters, but that felt like overkill for checking if GPUs are busy.
I built GPU Hot as a simpler alternative. This post is about why that made sense.
The Grafana Problem
Grafana is excellent for production monitoring at scale. But for a small team with a few GPU boxes, you're looking at:
- Installing Prometheus
- Installing node exporters on each server
- Installing GPU exporters
- Writing Prometheus configs
- Setting up Grafana dashboards
- Maintaining all of this
For this use case (checking GPU utilization while walking to get coffee), this was too much infrastructure.
What I Actually Needed
A web page that shows: which GPUs are in use, temperature, memory usage, and what processes are running. Updates in real-time so I can see when a training job finishes.
That's it. No alerting, no long-term storage, no complex queries.
The Setup
One Docker command per server:
docker run -d --gpus all -p 1312:1312 ghcr.io/psalias2006/gpu-hot:latest
Open http://localhost:1312 and you see your GPUs updating every 0.5 seconds.
For multiple servers, run the container on each GPU box, then start a hub:
# On each GPU server
docker run -d --gpus all -p 1312:1312 \
-e NODE_NAME=$(hostname) \
ghcr.io/psalias2006/gpu-hot:latest
# On your laptop (no GPU needed)
docker run -d -p 1312:1312 \
-e GPU_HOT_MODE=hub \
-e NODE_URLS=http://server1:1312,http://server2:1312 \
ghcr.io/psalias2006/gpu-hot:latest
Open http://localhost:1312 and you see all GPUs from all servers in one dashboard. Total setup time: under 5 minutes.
How It Works
The core is straightforward:
NVML for metrics: Python's NVML bindings give direct access to GPU data. Faster than parsing nvidia-smi output and returns structured data.
FastAPI + WebSockets: Async WebSockets push metrics to the browser. No polling, sub-second updates. The server collects metrics and broadcasts them to all connected clients.
Hub mode: Each node runs the same container and exposes metrics via WebSocket. The hub connects to all nodes, aggregates their data, and serves it through a single dashboard.
Frontend: Vanilla JavaScript with Chart.js. No build step, no framework, just HTML/CSS/JS.
Docker: Packages everything. Users don't need to install Python, NVML bindings, or manage dependencies. The NVIDIA Container Toolkit handles GPU access.
When This Approach Works
This pattern works well when:
- You have a small number of machines (1-20)
- You need real-time visibility, not historical analysis
- Your team is small enough that everyone can check one dashboard
- You don't need alerting or complex queries
It doesn't replace proper monitoring for production services. But for development infrastructure in a small team, it's sufficient and much simpler to maintain.
Trade-offs
What you lose compared to Grafana:
- No persistent storage (metrics are only kept in memory for the current session)
- No alerting
- No complex queries or correlations
- No authentication (we run this on an internal network)
What you gain:
- Zero configuration
- Sub-second updates
- No maintenance
- One command deployment
For this use case, the trade-off made sense. This isn't for monitoring production services. It's for checking if GPUs are free before starting a training run.
Takeaway
Not every monitoring problem needs the full observability stack. For small teams with straightforward needs, a purpose-built tool can be simpler to deploy and maintain than configuring enterprise solutions.
Try the interactive demo to see it in action.
This content originally appeared on DEV Community and was authored by Panos S
Panos S | Sciencx (2025-11-01T23:36:55+00:00) You Don’t Always Need Grafana for GPU Monitoring. Retrieved from https://www.scien.cx/2025/11/01/you-dont-always-need-grafana-for-gpu-monitoring/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.

