Kamino task presets#6028
Conversation
There was a problem hiding this comment.
Code Review: Kamino Task Presets
This PR adds Kamino solver integration as opt-in physics presets across velocity, reach, cabinet, shadow-hand, and humanoid tasks. The core changes include a new ClosedLoopView for closed kinematic chains, torque routing via route_torque_to, and significant rework of the Kamino reset logic.
Summary
The architecture is well-conceived — making Kamino an opt-in preset that doesn't change default behavior is the right approach. The ClosedLoopView duck-typing pattern and the _get_root_view_articulation_ids defensive accessor are good defensive designs. However, there are several issues worth addressing around robustness, potential performance regressions, and missing safety checks.
Findings
1. 🔴 Performance regression: unconditional write_data_to_sim + sim.forward() + scene.update() on every reset step (all backends)
File: source/isaaclab/isaaclab/envs/manager_based_rl_env.py
The comment says "for others this is a no-op", but self.scene.write_data_to_sim(), self.sim.forward(), and self.scene.update(dt=self.physics_dt) are not no-ops for PhysX or MuJoCo-Warp. These calls flush buffers, run forward kinematics, and update all scene entities. This effectively doubles the per-step scene update cost when resets occur (which is frequent in RL training).
Consider gating this behind a check for maximal-coordinate solvers:
if getattr(self.sim, 'is_maximal_coordinate_solver', False):
self.scene.write_data_to_sim()
self.sim.forward()
self.scene.update(dt=self.physics_dt)Or add a backend-specific hook that is a true no-op for non-Kamino solvers.
2. 🟡 _forward_kamino uses .numpy() inside a hot path — GPU→CPU sync stall
File: source/isaaclab_newton/isaaclab_newton/physics/kamino_manager.py (_forward_kamino method)
Multiple .numpy() calls appear inside _forward_kamino:
_model.joint_type.numpy()[0]_model.joint_child.numpy()[: max(_joints_per_world, 0)]
These force GPU→CPU synchronization on every reset. Since the model topology is static, consider caching _first_joint_is_free, _per_world_children, and related derived values at solver construction time (in _build_solver) rather than recomputing them per call.
3. 🟡 Integer division assumption in ClosedLoopView can silently produce incorrect strides
File: source/isaaclab_newton/isaaclab_newton/assets/articulation/closed_loop_view.py (constructor)
The view computes per-world counts via integer division:
self._bodies_per_world = model.body_count // nw
self._joints_per_world = model.joint_count // nwIf model.body_count is not evenly divisible by world_count (e.g., due to shared kinematic anchors or asymmetric world construction), this silently produces incorrect strides, leading to memory access violations or corrupted data. Add a divisibility assertion:
assert model.body_count % nw == 0, f"body_count {model.body_count} not divisible by world_count {nw}"4. 🟡 world_reset_mask dtype changed from wp.int32 to wp.bool — verify Kamino solver API compatibility
File: source/isaaclab_newton/isaaclab_newton/physics/newton_manager.py
The _world_reset_mask dtype was changed from wp.int32 to wp.bool. The Kamino solver.reset(world_mask=...) API must accept boolean masks — if it expects int32 (as it did before this PR), this will cause a silent type mismatch or runtime error. Please confirm the upstream SolverKamino.reset() signature accepts wp.bool arrays, or add an explicit cast before passing to the solver.
5. 🟡 _joint_act_target_sim buffer is never zeroed between steps
File: source/isaaclab_newton/isaaclab_newton/assets/articulation/articulation.py (_apply_actuator_model)
The new _joint_act_target_sim buffer is allocated with wp.zeros_like(...) at creation, but unlike _joint_effort_target_sim, it's never explicitly zeroed at the start of each write_data_to_sim call. If an actuator switches route_torque_to between steps, or if only some actuators route to joint_act, stale values from a previous step will persist for unwritten indices. Consider zeroing both target buffers at the top of _apply_actuator_model.
6. 🟡 _prim_has_closed_kinematic_loops silently swallows errors during USD traversal
File: source/isaaclab_newton/isaaclab_newton/cloner/newton_replicate.py
The broad except Exception: continue block:
try:
joint = UsdPhysics.Joint(descendant)
targets = joint.GetBody1Rel().GetTargets()
except Exception:
continueThis swallows any exception — including RuntimeError from invalid stage state, AttributeError from API changes, or memory errors. If the USD stage is in an inconsistent state, this function will silently return False (no loops detected), causing the code to attempt tree-structured articulation on a closed-loop robot, likely producing a cryptic downstream crash. Consider narrowing the catch or at minimum logging a warning:
except (ValueError, Tf.ErrorException) as exc:
logger.debug("Skipping joint %s during loop detection: %s", descendant.GetPath(), exc)
continue7. 🟡 NewtonKaminoManager.forward() runs full eval_fk on all worlds when no reset is pending
File: source/isaaclab_newton/isaaclab_newton/physics/kamino_manager.py (forward method)
When _kamino_needs_reset is False, the method falls through to:
eval_fk(cls._model, cls._state_0.joint_q, cls._state_0.joint_qd, cls._state_0, None)Passing None as the mask means FK is recomputed for all articulations/worlds unconditionally. Since forward() is called from the new post-reset path in manager_based_rl_env.py (and potentially from other call sites), this could be expensive for large world counts. Consider either: (a) passing an empty mask to make it a no-op when not needed, or (b) adding an early return when no FK invalidation has occurred.
8. 🟡 Inconsistent max_contacts_per_world across task presets — missing rationale
Files: Various task config files
Some presets specify max_contacts_per_world=64 (a1, cassie, g1, h1, cabinet, reach=32) while others leave it as None (anymal_b, anymal_c, go1, go2, spot, humanoid, shadow_hand). The PR description doesn't explain the selection criteria. For contact-rich tasks like shadow_hand (24 fingers + object) and humanoid (many body links), the absence of a cap could cause GPU OOM at scale, which is exactly what this parameter was designed to prevent.
Minor Observations
- The reach table init-pose change (
pos=(0.5, 0.15, -0.5)→pos=(0.5, 0, -0.5)) modifies thenewton_mjwarppreset, not just the new kamino preset. Confirm this is intentional and doesn't break existing MuJoCo-Warp reach training. _prim_has_closed_kinematic_loopsonly detects cycles via multi-parent children (Body1Rel). Exotic topologies where the cycle manifests differently may be missed — worth a brief comment noting this limitation.- The changelog files are well-written and properly categorized. 👍
Verdict
The PR is architecturally sound and the opt-in preset pattern is the correct integration strategy. The main concern is the potential performance regression for all backends from the unconditional post-reset refresh (Finding #1). The GPU→CPU sync in the hot path (Finding #2), the missing buffer zeroing (Finding #5), and the silent error swallowing in loop detection (Finding #6) should also be addressed before merge. The remaining items are lower priority but worth considering.
| self.scene.write_data_to_sim() | ||
| self.sim.forward() | ||
| self.scene.update(dt=self.physics_dt) | ||
|
|
There was a problem hiding this comment.
🔴 Performance concern: These three calls (write_data_to_sim, sim.forward, scene.update) are not no-ops for PhysX/MuJoCo-Warp backends — they flush buffers, run FK, and update all scene entities. This can double the per-step scene-update cost when resets occur (which is common in RL). Consider gating behind a solver-type check or adding a backend hook that's truly a no-op for non-maximal-coordinate solvers.
| at high ``num_envs``. It is only correct when the written ``joint_q`` equals | ||
| the model default (all zeros). | ||
|
|
||
| Args: |
There was a problem hiding this comment.
🟡 GPU→CPU sync in hot path: _model.joint_type.numpy()[0] and _model.joint_child.numpy() force device synchronization on every reset call. Since model topology is static after construction, consider caching these derived values (_first_joint_is_free, _per_world_children, etc.) at solver construction time in _build_solver().
| self._joints_per_world = model.joint_count // nw | ||
| self._joint_dofs_per_world = getattr(model, "joint_dof_count", model.joint_count) // nw | ||
| self._joint_coords_per_world = getattr(model, "joint_coord_count", model.joint_count) // nw | ||
| self._shapes_per_world = model.shape_count // nw |
There was a problem hiding this comment.
🟡 Silent corruption risk: If model.body_count is not evenly divisible by world_count, the integer division silently produces incorrect per-world counts leading to wrong strides. Consider adding divisibility assertions here.
| gear_ratio = actuator.gear_ratio | ||
| else: | ||
| gear_ratio = None | ||
| target_torque_buf = ( |
There was a problem hiding this comment.
🟡 Stale buffer values: _joint_act_target_sim is never zeroed between steps. If only some actuators route to joint_act, indices belonging to other actuators retain stale values from prior steps. Consider zeroing this buffer (and _joint_effort_target_sim) at the top of _apply_actuator_model.
Greptile SummaryThis PR exposes the Kamino PADMM solver as an opt-in
Confidence Score: 3/5The task preset additions are additive and safe; the deeper infrastructure changes in the RL env step loop touch a path that runs every training step for every backend, warranting a second look before merging. The source/isaaclab/isaaclab/envs/manager_based_rl_env.py — the extra scene.update call; source/isaaclab_newton/isaaclab_newton/assets/articulation/closed_loop_view.py — the articulation_ids allocation pattern Important Files Changed
Sequence DiagramsequenceDiagram
participant Env as ManagerBasedRLEnv
participant Scene as Scene
participant Sim as sim (forward/step)
participant KM as NewtonKaminoManager
Note over Env: RL training step (with resets)
Env->>Sim: sim.step()
Sim->>KM: step() — physics advance
KM-->>Sim: "(no reset, _kamino_needs_reset=False)"
Env->>Env: compute rewards / terminations
Env->>Env: _reset_idx(reset_env_ids)
Note over KM: invalidate_fk() sets _kamino_needs_reset=True, ORs _world_reset_mask
Env->>Scene: write_data_to_sim()
Env->>Sim: sim.forward()
Sim->>KM: forward()
KM->>KM: _forward_kamino(world_mask)
Note over KM: solver.reset(), sync state_1, eval_fk
KM->>KM: clear _kamino_needs_reset and masks
Env->>Scene: scene.update(dt)
Note over Scene: runs for ALL backends, not just Kamino
Env->>Env: compute observations
|
| self.scene.write_data_to_sim() | ||
| self.sim.forward() | ||
| self.scene.update(dt=self.physics_dt) |
There was a problem hiding this comment.
Extra
scene.update() runs for all backends, not just Kamino
scene.update(dt=self.physics_dt) is unconditionally called inside the reset block for every backend. The comment says it's "a no-op for other backends", but this isn't accurate: sensors and scene entities that use the dt argument (e.g. contact sensors that maintain contact history or apply velocity-based filtering) will advance their internal time-integrators a second time within the same physics step for non-reset environments. The PhysX path's existing scene.update (from sim.step()) has already run; calling it again here would update non-reset envs' sensor state twice per step whenever any env resets — exactly the steps with the highest training throughput pressure.
Wrapping the extra calls in a Kamino-specific guard would avoid this for other backends.
Description
This is part of the Kamino integration into IsaacLab work.
Stacked on #6026 and #6027
After the latest changes in newton-physics/newton#2822, we can expose the Kamino solver as an opt-in physics preset on a set of existing tasks. Default backends are unchanged.
newton_kaminopreset to the velocity (cassie, g1, go2, h1, spot, a1, anymal_b, anymal_c, go1), reach, cabinet, shadow-hand, and humanoid task configs.