Skip to content

Newton kamino closed loop#6027

Open
aserifi wants to merge 2 commits into
isaac-sim:developfrom
aserifi:aserifi/newton-kamino-closed-loop
Open

Newton kamino closed loop#6027
aserifi wants to merge 2 commits into
isaac-sim:developfrom
aserifi:aserifi/newton-kamino-closed-loop

Conversation

@aserifi

@aserifi aserifi commented Jun 8, 2026

Copy link
Copy Markdown

Description

This is part of the Kamino integration into IsaacLab work.
Depends on #6026

Adds closed kinematic chain support and related solver/contact fixes to the Newton/Kamino backend.

  • Add ClosedLoopView to simulate closed kinematic chains in maximal coordinates with the Kamino solver.
  • Add max_contacts_per_world to KaminoSolverCfg to bound per-world contact allocation.
  • Route contact-sensor force aggregation through a single solver-agnostic path shared by Kamino and MuJoCo-Warp, fixing Kamino contact forces.
  • Wire route_torque_to through the Newton articulation and fix closed-loop reset/cloning paths.

@github-actions github-actions Bot added the isaac-lab Related to Isaac Lab team label Jun 8, 2026

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: Kamino Closed-Loop Support

Thanks for this substantial addition bringing closed kinematic chain support to the Newton/Kamino backend. The architecture is well-motivated — closed-loop robots (cyclic joint graphs) genuinely cannot use tree-structured ArticulationView, so a separate ClosedLoopView with strided global-array access is the right approach. Below are findings ordered by severity.


🔴 High Severity

1. manager_based_rl_env.py: Unconditional write_data_to_sim + sim.forward() + scene.update() on every reset is a performance regression for all backends

File: source/isaaclab/isaaclab/envs/manager_based_rl_env.py (lines added after _reset_idx)

The comment says "no-op for other backends," but write_data_to_sim() and scene.update(dt=...) are not free calls — they flush buffers and iterate over all assets/sensors. For PhysX users (the majority), this adds 3 function calls per step that previously didn't exist, even when reset_env_ids is empty (the if guard only checks len(reset_env_ids) > 0).

Suggested fix: Gate the refresh behind the solver backend:

if reset_env_ids is not None and len(reset_env_ids) > 0:
    self._reset_idx(reset_env_ids)
    # Only needed for maximal-coordinate solvers (Kamino)
    if getattr(self.sim, '_needs_post_reset_refresh', False):
        self.scene.write_data_to_sim()
        self.sim.forward()
        self.scene.update(dt=self.physics_dt)

Or at minimum, move it inside the len(reset_env_ids) > 0 check if it isn't already (verify the indentation scope).


2. _forward_kamino: Mixed Warp/Torch device without explicit device pinning

File: source/isaaclab_newton/isaaclab_newton/physics/kamino_manager.py

The reset path does wp.to_torch(cls._state_0.joint_q).reshape(...).contiguous()wp.from_torch(...). If cls._state_0.joint_q resides on a non-default CUDA device (e.g. cuda:1), wp.from_torch may not preserve the device context correctly depending on Warp version. Additionally, .contiguous() allocates a new tensor on every reset call — for a hot path in multi-env training this creates GC pressure.

Suggested fix:

  • Pre-allocate the base_q/base_u tensors in start_simulation and reuse them with an in-place copy.
  • Explicitly pass device=cls._state_0.joint_q.device to wp.from_torch.

🟡 Medium Severity

3. ClosedLoopView._extract_names assumes label iterable is subscriptable

File: source/isaaclab_newton/isaaclab_newton/assets/articulation/closed_loop_view.py, line ~73

@staticmethod
def _extract_names(labels, count: int) -> list[str]:
    return [lbl.rsplit("/", 1)[-1] for lbl in labels[:count]]

If labels is a generator or a Warp array (not a Python list), slicing with [:count] will fail silently or raise. The type hint is missing, and model.joint_label / model.body_label types should be validated.

Suggested fix: Add a type annotation and defensive conversion:

@staticmethod
def _extract_names(labels: list[str] | Any, count: int) -> list[str]:
    label_list = list(labels) if not isinstance(labels, list) else labels
    return [lbl.rsplit("/", 1)[-1] for lbl in label_list[:count]]

4. _prim_has_closed_kinematic_loops: Bare except Exception swallows schema errors

File: source/isaaclab_newton/isaaclab_newton/cloner/newton_replicate.py, lines 28-31

try:
    joint = UsdPhysics.Joint(descendant)
    targets = joint.GetBody1Rel().GetTargets()
except Exception:
    continue

This catches all exceptions including AttributeError, TypeError, or USD schema version mismatches that would indicate a real bug. If a joint prim exists but GetBody1Rel() returns None (possible with malformed USD), this silently skips the joint instead of flagging it.

Suggested fix: Narrow to the expected failure modes:

except (Tf.ErrorException, AttributeError):
    continue

5. _get_root_view_articulation_ids uses getattr fallback masking potential bugs

File: source/isaaclab_newton/isaaclab_newton/assets/articulation/articulation.py, new method

art_ids = getattr(self._root_view, "articulation_ids", None)
if art_ids is None or art_ids.ndim < 2 or art_ids.shape[1] == 0:
    return None

If self._root_view is accidentally None (e.g. initialization ordering bug), this silently returns None instead of raising, which would then cause invalidate_fk to take the else (full-reset) path — a correctness bug that manifests as all environments resetting simultaneously with no error message.

Suggested fix: Assert self._root_view is not None or access .articulation_ids directly (let AttributeError propagate if the view is unset).


🟢 Low Severity / Nits

6. _joint_act_target_sim is never zeroed between steps

File: source/isaaclab_newton/isaaclab_newton/assets/articulation/articulation.py

_joint_effort_target_sim is zeroed at the start of _apply_actuator_model (or by the caller). The new _joint_act_target_sim buffer is written per-actuator but if no actuator uses route_torque_to="joint_act", the buffer retains stale values from a previous step. Verify it is zeroed identically to _joint_effort_target_sim before the actuator loop.


7. Missing __all__ export for ClosedLoopView

The new closed_loop_view.py module doesn't define __all__ and isn't re-exported from the package __init__.py. While it's currently only imported locally, adding it to __all__ and the package init would allow downstream code (e.g. custom envs) to inspect or subclass it.


Summary

Severity Count
🔴 High 2
🟡 Medium 3
🟢 Low 2

The core architectural decision (ClosedLoopView as a duck-typed ArticulationView replacement) is sound. The main concerns are (1) the unconditional performance hit in the base RL env for non-Kamino users, and (2) allocation patterns in the hot reset path. The closed-loop detection heuristic and reset state synchronization logic are well thought out.

No blocking issues — approving with the above suggestions for improvement.

@aserifi aserifi mentioned this pull request Jun 8, 2026
@aserifi aserifi changed the title Aserifi/newton kamino closed loop Newton kamino closed loop Jun 8, 2026
@greptile-apps

greptile-apps Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds closed kinematic chain support to the Newton/Kamino backend, introducing ClosedLoopView as a duck-typed replacement for ArticulationView for cyclic joint graphs, fixing contact-sensor force aggregation, and wiring route_torque_to through the articulation path.

  • ClosedLoopView: New strided-view adapter that accesses global Newton Model/State/Control arrays directly when a robot's joint graph cannot form a tree; detected via _prim_has_closed_kinematic_loops at initialization time.
  • Kamino reset rework: _forward_kamino gains dual FK/no-FK paths, joint_q_prev snapping, and state_1 mirroring; _kamino_needs_reset flag prevents unnecessary solver resets on non-reset steps.
  • manager_based_rl_env.py post-reset refresh: A write_data_to_sim + sim.forward + scene.update block is added after in-step resets to flush Kamino's deferred FK; this runs unconditionally for all backends.

Confidence Score: 3/5

The core Newton/Kamino changes are internally consistent, but the post-reset scene refresh in the base RL env loop runs for every backend and the ClosedLoopView initialization path bypasses actuator config validation.

Two paths introduce risk beyond the Kamino-specific code. The unconditional scene.update(dt=self.physics_dt) after in-step resets touches all environments regardless of solver, and sensors with frame-accumulating state could behave differently when updated twice in one step. Separately, the ClosedLoopView early-return skips _validate_cfg(), so misconfigured actuator joint names on closed-loop robots will not be caught at startup — they will silently produce wrong torques at runtime.

source/isaaclab/isaaclab/envs/manager_based_rl_env.py (backend-agnostic reset refresh) and source/isaaclab_newton/isaaclab_newton/assets/articulation/articulation.py (skipped validation in ClosedLoopView path) warrant the most attention before merge.

Important Files Changed

Filename Overview
source/isaaclab/isaaclab/envs/manager_based_rl_env.py Adds write_data_to_sim + sim.forward + scene.update after in-step resets; runs unconditionally for all backends, potentially causing extra sensor-update passes for non-Kamino environments.
source/isaaclab_newton/isaaclab_newton/assets/articulation/closed_loop_view.py New ClosedLoopView duck-typing ArticulationView for cyclic joint graphs; articulation_ids property allocates a new GPU array on every call instead of caching.
source/isaaclab_newton/isaaclab_newton/assets/articulation/articulation.py ClosedLoopView branch early-returns before _validate_cfg() and _log_articulation_info(), silently bypassing actuator configuration validation; also replaces all _root_view.articulation_ids accesses with the new helper method.
source/isaaclab_newton/isaaclab_newton/physics/kamino_manager.py Substantial rework of _forward_kamino with FK/no-FK dual paths, joint_q_prev sync, state_1 mirroring, and a new forward() hook; docstring has stale wp.int32 dtype annotation.
source/isaaclab_newton/isaaclab_newton/physics/newton_manager.py world_reset_mask dtype changed from wp.int32 to wp.bool; new closed-loop reset kernels added; _kamino_needs_reset flag introduced to gate Kamino resets only when needed.
source/isaaclab_newton/isaaclab_newton/cloner/newton_replicate.py Adds _prim_has_closed_kinematic_loops USD traversal and SolverKamino.register_custom_attributes; bare except Exception silently discards unexpected USD errors.

Sequence Diagram

sequenceDiagram
    participant RLEnv as ManagerBasedRLEnv
    participant Scene as Scene
    participant Sim as SimulationContext
    participant KM as NewtonKaminoManager
    participant Artic as Articulation
    participant CLV as ClosedLoopView

    RLEnv->>RLEnv: detect reset_env_ids
    RLEnv->>Artic: _reset_idx(reset_env_ids)
    Artic->>KM: "invalidate_fk(env_ids=...)"
    KM->>KM: "_kamino_needs_reset = True"
    KM->>KM: scatter world_reset_mask

    RLEnv->>Scene: write_data_to_sim()
    RLEnv->>Sim: forward()
    Sim->>KM: forward()
    KM->>KM: _forward_kamino(world_mask)
    KM->>KM: solver.reset(state, joint_q, base_q, ...)
    KM->>KM: snap joint_q_prev for reset worlds
    KM->>KM: eval_fk(fk_reset_mask)
    KM->>KM: sync state_1 to state_0

    RLEnv->>Scene: "scene.update(dt=physics_dt)"
    Scene->>CLV: get_link_transforms(state)
    CLV->>CLV: _get_strided_view(body_q)
    Scene-->>RLEnv: updated buffers

    RLEnv->>RLEnv: observation_manager.compute()
Loading

Reviews (1): Last reviewed commit: "Add Kamino closed-loop articulation supp..." | Re-trigger Greptile

Comment on lines +255 to +260
# Refresh kinematics and derived buffers for the just-reset envs so the
# observations / terminations below reflect the reset pose. Maximal-coordinate
# solvers (e.g. Kamino) apply reset FK one step late; for others this is a no-op.
self.scene.write_data_to_sim()
self.sim.forward()
self.scene.update(dt=self.physics_dt)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Unconditional scene.update() after reset runs for every backend

The comment says this is a "no-op for other backends," but scene.update(dt=self.physics_dt) iterates all scene entities regardless of the solver. For PhysX or MuJoCo-Warp environments, this inserts an extra entity-update pass on every reset step. Sensors that accumulate per-call state (e.g., contact-history buffers that shift on each update()) could produce incorrect readings if a second call happens before the frame is complete. A backend guard (checking whether the active physics manager is NewtonKaminoManager) would scope this to where it is actually needed.

Comment on lines 52 to 54
Args:
world_mask: Per-world mask indicating which worlds to reset.
Shape ``(num_worlds,)``, dtype ``wp.int32``. If None, resets all worlds.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Stale dtype annotation in the docstring — world_mask was changed from wp.int32 to wp.bool in newton_manager.py, but the docstring here still says wp.int32.

Suggested change
Args:
world_mask: Per-world mask indicating which worlds to reset.
Shape ``(num_worlds,)``, dtype ``wp.int32``. If None, resets all worlds.
Args:
world_mask: Per-world mask indicating which worlds to reset.
Shape ``(num_worlds,)``, dtype ``wp.bool``. If None, resets all worlds.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +153 to +163
@property
def articulation_ids(self) -> wp.array:
"""Empty ``(world_count, 0)`` int32 array (closed-loop assets have no tree articulations).

Sizing buffers from ``articulation_ids`` (e.g.
``ArticulationData._create_jacobian_buffers``) yields zero rows, while
:meth:`Articulation._get_root_view_articulation_ids` treats the empty
array as "no articulations" so reset scoping falls back to ``env_ids`` /
``env_mask``.
"""
return wp.zeros((self._world_count, 0), dtype=wp.int32, device=self.device)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 articulation_ids allocates a new GPU array on every access

wp.zeros((self._world_count, 0), ...) is called each time the property is read. _get_root_view_articulation_ids() accesses this on every reset call, so repeated resets incur repeated GPU allocations. A zero-column array is small, but the allocation overhead and memory-pool pressure can accumulate in high-frequency training loops. Cache the array as self._articulation_ids in __init__ and return the cached instance from the property.

Comment on lines +27 to +38
joint = UsdPhysics.Joint(descendant)
targets = joint.GetBody1Rel().GetTargets()
except Exception:
continue
if not targets:
continue
child_path = targets[0].pathString
if not child_path:
continue
child_counts[child_path] = child_counts.get(child_path, 0) + 1
if child_counts[child_path] > 1:
return True

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Bare except Exception silently swallows all errors

The broad except Exception: continue catches genuinely unexpected failures (e.g., attribute errors in USD Python bindings, type mismatches) and discards them without any log output. A malformed joint prim that raises an unexpected error would be silently skipped, causing _prim_has_closed_kinematic_loops to return False and the ClosedLoopView branch to be missed. At minimum, log the exception at DEBUG level so it's visible in diagnostic traces.

Comment on lines +3543 to +3549
self._create_buffers()
self._process_cfg()
self._process_actuators_cfg()
self._process_tendons()
# Let the articulation data know that it is fully instantiated and ready to use.
self.data.is_primed = True
return

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 ClosedLoopView initialization path skips _validate_cfg() and _log_articulation_info()

The early return for ClosedLoopView omits both _validate_cfg() (which checks that actuator joint names match what the robot exposes) and _log_articulation_info(). Because _sim_bind_* arrays don't exist until the PHYSICS_READY callback fires, skipping update(0.0) is unavoidable, but _validate_cfg() typically doesn't need live simulation data. Misconfigured actuator mappings on closed-loop robots will therefore go undetected at startup and only surface as silent bad behaviour at runtime.

@aserifi aserifi mentioned this pull request Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant