Skip to content

Fix failing SKRL multi-gpu training error#6035

Merged
ooctipus merged 2 commits into
isaac-sim:developfrom
StafaH:mh/qa_fix_skrl
Jun 9, 2026
Merged

Fix failing SKRL multi-gpu training error#6035
ooctipus merged 2 commits into
isaac-sim:developfrom
StafaH:mh/qa_fix_skrl

Conversation

@StafaH

@StafaH StafaH commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Description

Fixes a skrl error where jax experimental is not imported and must be imported manually

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions Bot added bug Something isn't working documentation Improvements or additions to documentation asset New asset feature or request isaac-sim Related to Isaac Sim team isaac-mimic Related to Isaac Mimic team infrastructure labels Jun 8, 2026

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 IsaacLab Review Bot — PR #6035

Title: Fix failing SKRL multi-gpu training error
Author: @StafaH
Actual change: 1 commit (5eb949a2) — 2 files touched


Summary

This PR fixes a JAX import error in SKRL's multi-GPU/multi-host training path. On recent JAX versions, jax.experimental.multihost_utils must be explicitly imported before skrl's distributed models reference it. The fix adds a preemptive import of this submodule in the SkrlVecEnvWrapper function when the JAX ML framework is selected.

Files Changed (actual fix only)

File Change
source/isaaclab_rl/isaaclab_rl/skrl.py Added import jax.experimental.multihost_utils preload
source/isaaclab_rl/changelog.d/fix-skrl-jax-multihost-import.rst Changelog fragment

Analysis

Correctness: ✅ Looks correct

  • The fix adds import jax.experimental.multihost_utils # noqa: F401 before importing skrl.envs.wrappers.jax.wrap_env
  • This is the right placement — preloading the submodule before skrl attempts to use it ensures the module is available in sys.modules
  • The # noqa: F401 comment correctly suppresses the "imported but unused" lint warning since this is a side-effect import

Observations:

  1. Minimal and targeted — The fix is a single import line, narrowly scoped to the JAX code path. No risk to torch/warp paths.

  2. Root cause is upstream — The actual bug is in skrl's distributed models referencing jax.experimental.multihost_utils without importing it. This workaround is a reasonable IsaacLab-side fix while waiting for an upstream skrl fix.

  3. No version guard — The comment says "broken on recent JAX" but there's no version check. This is fine since the import is harmless on older JAX versions where the module already loads transitively.

  4. Changelog fragment is well-written — Clearly describes the issue and references the relevant function.

Suggestions (non-blocking)

  • Consider adding a brief comment or link to the skrl issue/version where this broke, to help future maintainers know when this workaround can be removed.
  • The PR description checklist is entirely unchecked — minor housekeeping gap.

Verdict

✅ LGTM — Clean, minimal, well-placed fix for a real multi-GPU training failure. No concerns.


0 blocking findings · 1 suggestion

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 IsaacLab Review Bot — PR #6035 (Updated)

Title: Fix failing SKRL multi-gpu training error
Author: @StafaH
Branch: mh/qa_fix_skrldevelop


Incremental Review Update (commit 02325a8)

New commit: 02325a8Merge branch 'develop' into mh/qa_fix_skrl

This is a merge commit to sync the PR branch with develop. No new feature changes or code related to the SKRL fix were introduced.

Previous review findings remain unchanged — the core JAX import fix and supporting changes in commit 5eb949a are still valid.


Original Review Summary (commit 5eb949a)

This PR addresses a JAX import error in SKRL multi-GPU training:

  1. Core fix: Pre-imports jax.experimental.multihost_utils before skrl's JAX wrapper to prevent ModuleNotFoundError
  2. New train_multigpu.py launcher: Unified CLI for multi-GPU training across rl_games, rsl_rl, and skrl
  3. Warp framework support: Adds "warp" as a valid ml_framework option
  4. SKRL config modernization: Renames state_preprocessorobservation_preprocessor
  5. Bug fix: Adds missing raise keyword before ValueError
  6. Test modernization & lazy imports

Outstanding Items from Previous Review

  1. ⚠️ Medium: Add changelog coverage for the state_preprocessorobservation_preprocessor rename
  2. 💡 Suggestion: Consider adding a quick existence check for TRAIN_SCRIPT path in train_multigpu.py
  3. 📝 PR hygiene: PR checklist is unchecked; ensure merge conflicts are resolved

Reviewed commit: 02325a8 (merge commit from develop)
Original review: 5eb949a

@StafaH StafaH changed the base branch from main to develop June 8, 2026 22:50
@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label Jun 8, 2026
@ooctipus ooctipus merged commit 4a04964 into isaac-sim:develop Jun 9, 2026
37 of 38 checks passed
AntoineRichard pushed a commit that referenced this pull request Jun 9, 2026
## Summary
- Cherry-pick #6035 from develop into release/3.0.0-beta2
- Preload jax.experimental.multihost_utils before importing skrl's JAX
wrapper
- Add the isaaclab_rl changelog fragment

## Validation
- git diff --check refs/remotes/upstream/release/3.0.0-beta2...HEAD
- python3 -m py_compile source/isaaclab_rl/isaaclab_rl/skrl.py

Co-authored-by: Mustafa H <34825877+StafaH@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

asset New asset feature or request bug Something isn't working documentation Improvements or additions to documentation infrastructure isaac-lab Related to Isaac Lab team isaac-mimic Related to Isaac Mimic team isaac-sim Related to Isaac Sim team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants