Fix/system monitor#875
Conversation
Enhance GPU monitoring by integrating NVIDIA and AMD detection, updating collection methods, and adding support for nvidia-ml-py package
b051716 to
33366a8
Compare
There was a problem hiding this comment.
Pull request overview
Improves the ROS2 system_monitor package to collect and publish more robust system workload metrics (notably GPU stats) across different hardware backends, and ensures the required Workload message is generated in bitbots_msgs.
Changes:
- Add
Workload.msgtobitbots_msgsinterface generation. - Refactor GPU monitoring to auto-detect NVIDIA (NVML), Jetson (sysfs), and AMD (pyamdgpuinfo) backends; tighten type consistency in collectors.
- Adjust sampling behavior (CPU smoothing + lower default update frequency) and add
nvidia-ml-pyto the Pixi environment.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/bitbots_msgs/CMakeLists.txt | Adds Workload.msg to rosidl-generated interfaces so downstream nodes can publish/subscribe it. |
| src/bitbots_misc/system_monitor/system_monitor/network_interfaces.py | Adds return type annotations for interface collection helpers. |
| src/bitbots_misc/system_monitor/system_monitor/monitor.py | Updates GPU collector call signature and aligns default “disabled” tuple types; minor comment grammar fix. |
| src/bitbots_misc/system_monitor/system_monitor/memory.py | Adds a typed return annotation for memory stats collection. |
| src/bitbots_misc/system_monitor/system_monitor/gpu.py | Replaces single-backend AMD logic with auto-detected NVIDIA/Jetson/AMD backends and improved error handling/logging. |
| src/bitbots_misc/system_monitor/system_monitor/cpus.py | Adds EMA smoothing for CPU usage values and updates return/type annotations. |
| src/bitbots_misc/system_monitor/config/config.yaml | Lowers default update frequency from 10 Hz to 2 Hz. |
| pixi.toml | Adds nvidia-ml-py dependency for NVML-based monitoring. |
| pixi.lock | Locks nvidia-ml-py into all environments. |
| .vscode/settings.json | Adds dictionary words related to new GPU monitoring terms. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # smooth short-term sampling noise with exponential moving average | ||
| prev = _prev_usage[cpu_num] | ||
| if prev == 0.0: | ||
| smoothed = float(round(raw_usage, 2)) | ||
| else: | ||
| smoothed = float(round((raw_usage * _EMA_ALPHA) + (prev * (1.0 - _EMA_ALPHA)), 2)) | ||
|
|
|
@copilot please implement the suggestions. |
Agent-Logs-Url: https://github.com/bit-bots/bitbots_main/sessions/797bd202-b15d-4688-b2a5-48352421a742 Co-authored-by: jaagut <34797331+jaagut@users.noreply.github.com>
Agent-Logs-Url: https://github.com/bit-bots/bitbots_main/sessions/797bd202-b15d-4688-b2a5-48352421a742 Co-authored-by: jaagut <34797331+jaagut@users.noreply.github.com>
Agent-Logs-Url: https://github.com/bit-bots/bitbots_main/sessions/797bd202-b15d-4688-b2a5-48352421a742 Co-authored-by: jaagut <34797331+jaagut@users.noreply.github.com>
Agent-Logs-Url: https://github.com/bit-bots/bitbots_main/sessions/797bd202-b15d-4688-b2a5-48352421a742 Co-authored-by: jaagut <34797331+jaagut@users.noreply.github.com>
Agent-Logs-Url: https://github.com/bit-bots/bitbots_main/sessions/797bd202-b15d-4688-b2a5-48352421a742 Co-authored-by: jaagut <34797331+jaagut@users.noreply.github.com>
Implemented the requested suggestions in bf629a3 (plus preceding commits in this update): GPU load is now consistently normalized to fraction semantics across backends, NVML init/handle are cached instead of per-sample init/shutdown, CPU EMA initialization no longer relies on |
|
This has been tested on the robot and works fine. Please review @ChlukasX |
Summary
Proposed changes
Related issues
Checklist
pixi run build