Newton kamino closed loop#6027
Conversation
There was a problem hiding this comment.
Code Review: Kamino Closed-Loop Support
Thanks for this substantial addition bringing closed kinematic chain support to the Newton/Kamino backend. The architecture is well-motivated — closed-loop robots (cyclic joint graphs) genuinely cannot use tree-structured ArticulationView, so a separate ClosedLoopView with strided global-array access is the right approach. Below are findings ordered by severity.
🔴 High Severity
1. manager_based_rl_env.py: Unconditional write_data_to_sim + sim.forward() + scene.update() on every reset is a performance regression for all backends
File: source/isaaclab/isaaclab/envs/manager_based_rl_env.py (lines added after _reset_idx)
The comment says "no-op for other backends," but write_data_to_sim() and scene.update(dt=...) are not free calls — they flush buffers and iterate over all assets/sensors. For PhysX users (the majority), this adds 3 function calls per step that previously didn't exist, even when reset_env_ids is empty (the if guard only checks len(reset_env_ids) > 0).
Suggested fix: Gate the refresh behind the solver backend:
if reset_env_ids is not None and len(reset_env_ids) > 0:
self._reset_idx(reset_env_ids)
# Only needed for maximal-coordinate solvers (Kamino)
if getattr(self.sim, '_needs_post_reset_refresh', False):
self.scene.write_data_to_sim()
self.sim.forward()
self.scene.update(dt=self.physics_dt)Or at minimum, move it inside the len(reset_env_ids) > 0 check if it isn't already (verify the indentation scope).
2. _forward_kamino: Mixed Warp/Torch device without explicit device pinning
File: source/isaaclab_newton/isaaclab_newton/physics/kamino_manager.py
The reset path does wp.to_torch(cls._state_0.joint_q).reshape(...) → .contiguous() → wp.from_torch(...). If cls._state_0.joint_q resides on a non-default CUDA device (e.g. cuda:1), wp.from_torch may not preserve the device context correctly depending on Warp version. Additionally, .contiguous() allocates a new tensor on every reset call — for a hot path in multi-env training this creates GC pressure.
Suggested fix:
- Pre-allocate the base_q/base_u tensors in
start_simulationand reuse them with an in-place copy. - Explicitly pass
device=cls._state_0.joint_q.devicetowp.from_torch.
🟡 Medium Severity
3. ClosedLoopView._extract_names assumes label iterable is subscriptable
File: source/isaaclab_newton/isaaclab_newton/assets/articulation/closed_loop_view.py, line ~73
@staticmethod
def _extract_names(labels, count: int) -> list[str]:
return [lbl.rsplit("/", 1)[-1] for lbl in labels[:count]]If labels is a generator or a Warp array (not a Python list), slicing with [:count] will fail silently or raise. The type hint is missing, and model.joint_label / model.body_label types should be validated.
Suggested fix: Add a type annotation and defensive conversion:
@staticmethod
def _extract_names(labels: list[str] | Any, count: int) -> list[str]:
label_list = list(labels) if not isinstance(labels, list) else labels
return [lbl.rsplit("/", 1)[-1] for lbl in label_list[:count]]4. _prim_has_closed_kinematic_loops: Bare except Exception swallows schema errors
File: source/isaaclab_newton/isaaclab_newton/cloner/newton_replicate.py, lines 28-31
try:
joint = UsdPhysics.Joint(descendant)
targets = joint.GetBody1Rel().GetTargets()
except Exception:
continueThis catches all exceptions including AttributeError, TypeError, or USD schema version mismatches that would indicate a real bug. If a joint prim exists but GetBody1Rel() returns None (possible with malformed USD), this silently skips the joint instead of flagging it.
Suggested fix: Narrow to the expected failure modes:
except (Tf.ErrorException, AttributeError):
continue5. _get_root_view_articulation_ids uses getattr fallback masking potential bugs
File: source/isaaclab_newton/isaaclab_newton/assets/articulation/articulation.py, new method
art_ids = getattr(self._root_view, "articulation_ids", None)
if art_ids is None or art_ids.ndim < 2 or art_ids.shape[1] == 0:
return NoneIf self._root_view is accidentally None (e.g. initialization ordering bug), this silently returns None instead of raising, which would then cause invalidate_fk to take the else (full-reset) path — a correctness bug that manifests as all environments resetting simultaneously with no error message.
Suggested fix: Assert self._root_view is not None or access .articulation_ids directly (let AttributeError propagate if the view is unset).
🟢 Low Severity / Nits
6. _joint_act_target_sim is never zeroed between steps
File: source/isaaclab_newton/isaaclab_newton/assets/articulation/articulation.py
_joint_effort_target_sim is zeroed at the start of _apply_actuator_model (or by the caller). The new _joint_act_target_sim buffer is written per-actuator but if no actuator uses route_torque_to="joint_act", the buffer retains stale values from a previous step. Verify it is zeroed identically to _joint_effort_target_sim before the actuator loop.
7. Missing __all__ export for ClosedLoopView
The new closed_loop_view.py module doesn't define __all__ and isn't re-exported from the package __init__.py. While it's currently only imported locally, adding it to __all__ and the package init would allow downstream code (e.g. custom envs) to inspect or subclass it.
Summary
| Severity | Count |
|---|---|
| 🔴 High | 2 |
| 🟡 Medium | 3 |
| 🟢 Low | 2 |
The core architectural decision (ClosedLoopView as a duck-typed ArticulationView replacement) is sound. The main concerns are (1) the unconditional performance hit in the base RL env for non-Kamino users, and (2) allocation patterns in the hot reset path. The closed-loop detection heuristic and reset state synchronization logic are well thought out.
No blocking issues — approving with the above suggestions for improvement.
Greptile SummaryThis PR adds closed kinematic chain support to the Newton/Kamino backend, introducing
Confidence Score: 3/5The core Newton/Kamino changes are internally consistent, but the post-reset scene refresh in the base RL env loop runs for every backend and the ClosedLoopView initialization path bypasses actuator config validation. Two paths introduce risk beyond the Kamino-specific code. The unconditional
Important Files Changed
Sequence DiagramsequenceDiagram
participant RLEnv as ManagerBasedRLEnv
participant Scene as Scene
participant Sim as SimulationContext
participant KM as NewtonKaminoManager
participant Artic as Articulation
participant CLV as ClosedLoopView
RLEnv->>RLEnv: detect reset_env_ids
RLEnv->>Artic: _reset_idx(reset_env_ids)
Artic->>KM: "invalidate_fk(env_ids=...)"
KM->>KM: "_kamino_needs_reset = True"
KM->>KM: scatter world_reset_mask
RLEnv->>Scene: write_data_to_sim()
RLEnv->>Sim: forward()
Sim->>KM: forward()
KM->>KM: _forward_kamino(world_mask)
KM->>KM: solver.reset(state, joint_q, base_q, ...)
KM->>KM: snap joint_q_prev for reset worlds
KM->>KM: eval_fk(fk_reset_mask)
KM->>KM: sync state_1 to state_0
RLEnv->>Scene: "scene.update(dt=physics_dt)"
Scene->>CLV: get_link_transforms(state)
CLV->>CLV: _get_strided_view(body_q)
Scene-->>RLEnv: updated buffers
RLEnv->>RLEnv: observation_manager.compute()
Reviews (1): Last reviewed commit: "Add Kamino closed-loop articulation supp..." | Re-trigger Greptile |
| # Refresh kinematics and derived buffers for the just-reset envs so the | ||
| # observations / terminations below reflect the reset pose. Maximal-coordinate | ||
| # solvers (e.g. Kamino) apply reset FK one step late; for others this is a no-op. | ||
| self.scene.write_data_to_sim() | ||
| self.sim.forward() | ||
| self.scene.update(dt=self.physics_dt) |
There was a problem hiding this comment.
Unconditional
scene.update() after reset runs for every backend
The comment says this is a "no-op for other backends," but scene.update(dt=self.physics_dt) iterates all scene entities regardless of the solver. For PhysX or MuJoCo-Warp environments, this inserts an extra entity-update pass on every reset step. Sensors that accumulate per-call state (e.g., contact-history buffers that shift on each update()) could produce incorrect readings if a second call happens before the frame is complete. A backend guard (checking whether the active physics manager is NewtonKaminoManager) would scope this to where it is actually needed.
| Args: | ||
| world_mask: Per-world mask indicating which worlds to reset. | ||
| Shape ``(num_worlds,)``, dtype ``wp.int32``. If None, resets all worlds. |
There was a problem hiding this comment.
Stale dtype annotation in the docstring —
world_mask was changed from wp.int32 to wp.bool in newton_manager.py, but the docstring here still says wp.int32.
| Args: | |
| world_mask: Per-world mask indicating which worlds to reset. | |
| Shape ``(num_worlds,)``, dtype ``wp.int32``. If None, resets all worlds. | |
| Args: | |
| world_mask: Per-world mask indicating which worlds to reset. | |
| Shape ``(num_worlds,)``, dtype ``wp.bool``. If None, resets all worlds. |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| @property | ||
| def articulation_ids(self) -> wp.array: | ||
| """Empty ``(world_count, 0)`` int32 array (closed-loop assets have no tree articulations). | ||
|
|
||
| Sizing buffers from ``articulation_ids`` (e.g. | ||
| ``ArticulationData._create_jacobian_buffers``) yields zero rows, while | ||
| :meth:`Articulation._get_root_view_articulation_ids` treats the empty | ||
| array as "no articulations" so reset scoping falls back to ``env_ids`` / | ||
| ``env_mask``. | ||
| """ | ||
| return wp.zeros((self._world_count, 0), dtype=wp.int32, device=self.device) |
There was a problem hiding this comment.
articulation_ids allocates a new GPU array on every access
wp.zeros((self._world_count, 0), ...) is called each time the property is read. _get_root_view_articulation_ids() accesses this on every reset call, so repeated resets incur repeated GPU allocations. A zero-column array is small, but the allocation overhead and memory-pool pressure can accumulate in high-frequency training loops. Cache the array as self._articulation_ids in __init__ and return the cached instance from the property.
| joint = UsdPhysics.Joint(descendant) | ||
| targets = joint.GetBody1Rel().GetTargets() | ||
| except Exception: | ||
| continue | ||
| if not targets: | ||
| continue | ||
| child_path = targets[0].pathString | ||
| if not child_path: | ||
| continue | ||
| child_counts[child_path] = child_counts.get(child_path, 0) + 1 | ||
| if child_counts[child_path] > 1: | ||
| return True |
There was a problem hiding this comment.
Bare
except Exception silently swallows all errors
The broad except Exception: continue catches genuinely unexpected failures (e.g., attribute errors in USD Python bindings, type mismatches) and discards them without any log output. A malformed joint prim that raises an unexpected error would be silently skipped, causing _prim_has_closed_kinematic_loops to return False and the ClosedLoopView branch to be missed. At minimum, log the exception at DEBUG level so it's visible in diagnostic traces.
| self._create_buffers() | ||
| self._process_cfg() | ||
| self._process_actuators_cfg() | ||
| self._process_tendons() | ||
| # Let the articulation data know that it is fully instantiated and ready to use. | ||
| self.data.is_primed = True | ||
| return |
There was a problem hiding this comment.
ClosedLoopView initialization path skips
_validate_cfg() and _log_articulation_info()
The early return for ClosedLoopView omits both _validate_cfg() (which checks that actuator joint names match what the robot exposes) and _log_articulation_info(). Because _sim_bind_* arrays don't exist until the PHYSICS_READY callback fires, skipping update(0.0) is unavoidable, but _validate_cfg() typically doesn't need live simulation data. Misconfigured actuator mappings on closed-loop robots will therefore go undetected at startup and only surface as silent bad behaviour at runtime.
Description
This is part of the Kamino integration into IsaacLab work.
Depends on #6026
Adds closed kinematic chain support and related solver/contact fixes to the Newton/Kamino backend.
ClosedLoopViewto simulate closed kinematic chains in maximal coordinates with the Kamino solver.max_contacts_per_worldtoKaminoSolverCfgto bound per-world contact allocation.route_torque_tothrough the Newton articulation and fix closed-loop reset/cloning paths.