[DRAFT][Newton][Factory] Prereqs for NutThread support#5835
[DRAFT][Newton][Factory] Prereqs for NutThread support#5835mzamoramora-nvidia wants to merge 22 commits into
Conversation
Add NewtonCollisionPipelineCfg and HydroelasticSDFCfg to expose Newton's collision pipeline parameters (broad phase, contact limits, hydroelastic mode) through NewtonCfg.collision_cfg. Add MJWarpSolverCfg.tolerance for solver convergence control. Fix validation order so collision_cfg is stored before the use_mujoco_contacts consistency check runs. Reset _collision_cfg in clear() to avoid stale state across reset cycles. Fall back to a default CollisionPipeline when collision_cfg is None.
Move config resolution out of NewtonManager into NewtonCollisionPipelineCfg.to_pipeline_args(), following the Kamino to_solver_config() pattern. Fix truthiness check on hydroelastic config dict to use explicit `is not None`. Add missing changelog entry for MJWarpSolverCfg.tolerance. Use specific return type hint dict[str, Any].
Add SDFCfg @configclass for SDF mesh collision configuration with fields for resolution, narrow band, margin, body/shape pattern matching, per-pattern resolution overrides, visual mesh fallback, and hydroelastic stiffness assignment. Add five missing fields to HydroelasticSDFCfg: moment_matching, buffer_mult_broad, buffer_mult_iso, buffer_mult_contact, and grid_size.
Add sdf_cfg field (SDFCfg | None) to NewtonCfg and re-export SDFCfg from the physics package __init__.pyi so callers can configure SDF mesh collision at the top-level manager config.
Add _build_sdf_on_mesh, _create_sdf_collision_from_visual, and _apply_sdf_config classmethods to NewtonManager. These methods apply SDFCfg settings to matching mesh shapes in the model builder, including per-pattern resolution overrides and optional hydroelastic flags.
When SDF patterns are configured, exclude matching mesh shapes from convex-hull approximation so their triangle geometry is preserved for mesh.build_sdf(). Also call NewtonManager._apply_sdf_config() on each prototype before add_builder copies it N times, so SDF is built once and all environments inherit it.
Unit tests for NewtonManager._build_sdf_on_mesh and _apply_sdf_config using unittest.mock to avoid needing a running Newton simulation. Covers resolution overrides, hydroelastic flags, body/shape pattern matching, and non-mesh shape exclusion.
Fix max_resolution/target_voxel_size docstrings to accurately describe that both are forwarded to Newton. Remove unreachable collision_cfg force-override block in initialize_solver.
- Pass hydro_patterns to _create_sdf_collision_from_visual so it respects hydroelastic_shape_patterns filter (was unconditionally setting HYDROELASTIC) - Replace getattr(cfg, "sdf_cfg", None) with direct attribute access in all 3 locations - Add regex validation with actionable ValueError on invalid patterns - Log warning and return False from _build_sdf_on_mesh when mesh is None; only count successful SDF builds in num_patched - Fix :attr: -> :class: for SDFCfg reference in docstring
## Summary - Adds ``NewtonCfg.collision_decimation`` (default ``0`` = legacy one-collide-per-tick behaviour). - When positive, ``_run_solver_substeps`` re-invokes the collision pipeline every ``collision_decimation`` substeps so contact normals reflect the bodies' just-integrated poses instead of the poses from the top of the tick. - The last substep is intentionally skipped — its contact set would only affect the next tick, which the top-of-tick ``collide()`` in ``_simulate_physics_only`` / ``_simulate_full`` owns. - ``NewtonCfg.__post_init__`` warns when ``collision_decimation >= num_substeps`` (silently equivalent to ``0``). ## Test plan - [x] ``./isaaclab.sh -p -m pytest source/isaaclab_newton/test/physics/test_newton_manager_abstraction.py`` — 39/39 pass - [x] ``./isaaclab.sh -f`` — pre-commit clean
## Summary - Adds ``test_newton_cfg_collision_decimation_warning`` (parametrized over ``(num_substeps, collision_decimation)``) for the ``__post_init__`` warning gate. - Adds ``test_collision_decimation_invokes_mid_loop_collide`` that wraps ``NewtonManager._collision_pipeline.collide`` with a counter, runs one physics tick with a sphere falling onto a ground plane, and asserts the call count equals ``1 + floor((num_substeps - 1) / collision_decimation)``. - Adds ``source/isaaclab_newton/changelog.d/mzamoramora-collision-decimation.minor.rst``. ## Test plan - [x] ``./isaaclab.sh -p -m pytest source/isaaclab_newton/test/physics/test_newton_manager_abstraction.py`` — 39/39 pass (28 existing + 11 new) - [x] ``./isaaclab.sh -f`` — pre-commit clean
…nfiguration Merged Antoine's antoiner/newton-sdf-config branch with conflict resolution: - extension.toml / CHANGELOG.rst: kept HEAD (develop has advanced past PR's manual version bumps; changelog.d conventions supersede manual edits). - __init__.pyi: union of exports — kept develop's manager classes + added SDFCfg. - newton_collision_cfg.py: union — kept develop's import/doc updates, added back PR's 5 hydroelastic fields and the SDFCfg class. - newton_replicate.py: union — kept develop's deformable-prim handling hooks, layered PR's SDF-pattern skip logic and _apply_sdf_config call. - newton_manager.py: kept develop's structure (modernized API), surgically grafted PR's three SDF helper methods (_build_sdf_on_mesh, _create_sdf_collision_from_visual, _apply_sdf_config) and 're' import. - newton_manager_cfg.py: kept develop's structure, added sdf_cfg field on NewtonCfg and SDFCfg import. Not carried over: PR's initialize_solver SDF-detection branch and _initialize_contacts hydroelastic warning (deferred — develop's initialize_solver has been refactored to delegate to _build_solver). This vendoring is for local use on the bundle branch. When PR isaac-sim#5228 lands upstream, this merge commit is reverted and develop is pulled in.
…tick re-collide
``RigidObjectData._create_simulation_bindings`` crashed with ``IndexError: tuple index out of range`` whenever the rigid object was fixed-base AND its root view only contained a single body — e.g. a kinematic-enabled rigid object built from a USD that authors a ``PhysicsFixedJoint``. Newton's ``get_root_transforms`` returns a 2D ``(count, links)`` array in that scenario, but the legacy ``is_fixed_base`` branch unconditionally indexed ``[:, 0, 0]`` assuming a 3D ``(count, links, 1)`` layout. Dispatch on actual ``ndim`` so both fixed-base multi-link articulations and single-body kinematic rigid objects are handled. ``_create_buffers`` similarly allocated the no-velocity fallback for ``_sim_bind_body_com_vel_w`` as 1D, which the ``derive_body_acceleration_from_body_com_velocities`` kernel rejected at launch time (expects 2D). Allocate it as ``(num_instances, 1)`` to match the kernel signature.
Add classmethod `NewtonManager.sanitize_world_state(env_ids)` that
zeroes per-world MJWarp solver scratch (qacc_warmstart, qfrc_*, cacc,
cfrc_*) and Newton State velocity/force buffers (joint_qd, body_qd,
body_f, body_qdd, body_parent_f) for the given worlds, then runs
eval_fk to re-derive body_q.
Generalizes the in-tree Factory NaN scrubber so any task on Newton can
recover NaN-divergent worlds in place via:
if has_nan_envs:
NewtonManager.sanitize_world_state(nan_env_ids)
Compared to mjwarp.reset_data: closes the gap on derived qfrc_* / cacc
/ cfrc_int / cfrc_ext that reset_data doesn't touch. cfrc_int in
particular is read-modify-written via wp.atomic_add in
mujoco_warp/_src/smooth.py:_cfrc_backward, so a stale NaN there
survives the next rne() call and re-contaminates qfrc_bias.
No-op for solvers without mjw_data (XPBD, Featherstone, Kamino) since
those don't exhibit the MuJoCo warm-start contamination pattern.
There was a problem hiding this comment.
Code Review Summary
This PR bundles four prerequisite changes for Factory NutThread support on the Newton backend. The implementation is well-structured overall with proper configuration separation and comprehensive test coverage for the SDF configuration logic.
✅ Strengths
- Clean separation between configuration classes and implementation
- Proper handling of mid-tick re-collide skip on last substep
- Comprehensive unit tests for SDF configuration (14 tests)
- Good docstrings explaining the rationale for defensive buffer zeroing in
sanitize_world_state
🔍 Findings
1. [Medium] Potential device mismatch in flag_nan_envs (newton_manager.py, ~line 1575)
When _nan_env_mask_pending_reset is first created via torch.zeros_like(mask), it inherits the device from the input mask. If a subsequent call passes a mask on a different device, the |= operation will fail.
Suggested fix:
if cls._nan_env_mask_pending_reset is None:
cls._nan_env_mask_pending_reset = torch.zeros_like(mask)
elif cls._nan_env_mask_pending_reset.device != mask.device:
mask = mask.to(cls._nan_env_mask_pending_reset.device)
cls._nan_env_mask_pending_reset |= maskAlternatively, document that all masks must be on the same device as the first call.
2. [Medium] Defensive assertion for unexpected ndim (rigid_object_data.py, ~line 815)
The ndim >= 3 check correctly handles the 2D vs 3D dispatch, but an ndim of 1 or 4+ would silently take the "else" branch. Making the assumption explicit would fail fast on unexpected Newton API changes:
root_xforms = self._root_view.get_root_transforms(SimulationManager.get_state_0())
if root_xforms.ndim not in (2, 3):
raise ValueError(f"Unexpected root_xforms ndim={root_xforms.ndim}, expected 2 or 3")
self._sim_bind_root_link_pose_w = root_xforms[:, 0, 0] if root_xforms.ndim == 3 else root_xforms[:, 0]3. [Low/Suggestion] Add trace logging when sanitize_world_state is no-op (newton_manager.py, ~line 1511)
For debugging purposes, it can be helpful to know when sanitize_world_state is a no-op due to a non-MJWarp backend:
if mjw_data is None:
logger.debug("sanitize_world_state: skipped (no mjw_data, likely non-MJWarp solver)")
return4. [Low/Suggestion] Duplicate regex compilation helper (newton_replicate.py, line 18)
The _compile_sdf_patterns function duplicates the same logic in newton_manager.py:_apply_sdf_config. Consider extracting to a shared utility in newton_collision_cfg.py to avoid drift between the two implementations.
📋 Test Coverage Notes
The PR acknowledges that standalone regression tests for the kinematic single-body fix and sanitize_world_state NaN recovery are follow-up items. Consider adding basic tests for the deferred NaN queue (flag_nan_envs/sanitize_pending_nan_envs) to prevent regressions in that coordination logic.
Prereqs for Factory NutThread support on the Newton backend
Description
Aggregates the four IsaacLab changes required to support the Factory NutThread task on the Newton backend onto a single base branch. Each item is independently reviewable and would normally live in its own upstream PR; they are bundled here so the follow-up Factory branch (#5836 — Factory NutThread support on the Newton backend) can rebase against one base instead of juggling four moving targets while the upstream PRs land.
This branch is intentionally draft — once the constituent upstream PRs are merged, this branch will be dropped and the Factory PR will rebase onto
developdirectly. The bundle exists only to unblock end-to-end Factory training while review proceeds in parallel.Relationship to the Factory NutThread PR
This PR is the base for #5836 — [Factory] NutThread support on the Newton backend. The commits here are also temporarily carried on the Factory branch so GitHub can render its diff; they belong to this PR's review surface.
Compare view (this PR's diff vs
develop):mzamoramora-nvidia/IsaacLab@develop...nut-thread-newton-prereqs
What this branch contains
NewtonCfg.collision_decimation(mid-tick re-collide)RigidObjectDatashape fix for kinematic single-body assetsNewtonManager.sanitize_world_statehook + deferred NaN-reset queueTopology:
The two new commits sit on top of the vendored PRs because both depend on
NewtonCollisionPipelineCfgbeing available inNewtonCfg.Detail per piece
PR #5228 — SDF / hydroelastic configuration (vendored as-is)
Adds
SDFCfgand surfaces theHydroelasticSDFCfgfields needed to drive Newton's pressure-based contact pipeline (moment_matching,buffer_mult_*,grid_size). No local modifications; merged verbatim so the Factory branch can configuresdf_hydroelastic_config=HydroelasticSDFCfg(...)against the same struct shape that will land upstream.PR #5729 —
NewtonCfg.collision_decimation(vendored as-is)Adds the opt-in
collision_decimationknob that re-invokes the Newton collision pipeline everyNsolver substeps inside a single physics tick. Default0preserves legacy behaviour. The Factory NutThreadnewton_sdfpreset usescollision_decimation=1(re-collide every substep); thenewton(hydroelastic) preset usescollision_decimation=2. These values were chosen empirically — tighter decimation kept the SDF preset stable on this task; hydroelastic tolerated a looser setting.RigidObjectData shape fix for kinematic single-body assets (new)
Fixes :class:
~isaaclab_newton.assets.RigidObjectDatacrashing withIndexError: tuple index out of rangewhen a fixed-base rigid object holds a single body. Theis_fixed_basebranch in_create_simulation_bindingsindexed[:, 0, 0]assuming a 3D(count, links, 1)layout, but Newton returns a 2D(count, links)array when the view contains a single body. Dispatch on actualndimso both multi-link fixed-base articulations and single-body kinematic rigid objects work. Also widens_sim_bind_body_com_vel_wto(num_instances, 1)so the no-velocity fallback matches thederive_body_acceleration_from_body_com_velocitieskernel's 2D signature.This is hit immediately when spawning the Factory bolt as a kinematic single-body
RigidObjectCfg(the natural Newton-side representation since the bolt is fixed in the world); other tasks that use the same kinematic single-body pattern would hit it too.NewtonManager.sanitize_world_statehook + deferred NaN-reset queue (new)Adds two classmethods to
NewtonManager:sanitize_world_state(env_ids)— zeroes per-world MJWarp solver internals (qacc_warmstart,qfrc_applied,xfrc_applied,qacc,cacc,cfrc_int,cfrc_ext) and NewtonStatevelocity buffers for the given worlds. Discussed in newton-physics/newton#1266. The current list is intentionally defensive — we clear every MJWarp-internal buffer that could plausibly carry NaN state forward, and stopped tightening once recovery was reliable. Not all of these probably need to be reset; a follow-up should bisect which subset is actually load-bearing.flag_nan_envs(mask)/sanitize_pending_nan_envs()— a tiny classvar-backed pending-reset queue. Tasks callflag_nan_envswhen they detect NaN (e.g. in_get_observations), and the env's reset path drains the queue viasanitize_pending_nan_envsonce per reset. This defers the scrub to the natural episode boundary instead of clobbering live warm-starts mid-rollout — earlier eager-sanitize attempts regressed hydroelastic training, and deferring the scrub fixed the regression.The buffers scrubbed are MJWarp-specific; the hook has only been exercised against
SolverMuJoCoso far. Other Newton solvers may need a different set of buffers, or none at all — that's an open question for when a non-MJWarp task first needs NaN recovery.Why bundle this with the prereqs rather than upstream-first: the API surface (two classmethods + one classvar) is small but the placement (which buffers to scrub, where to call the hook) is task-driven; the Factory NutThread env is the first task that actually exercises it. Once the contract is validated by landing the Factory PR, splitting this out into a standalone upstream PR is straightforward — it doesn't depend on anything else here.
Type of change
Test plan
./isaaclab.sh -p -m pytest source/isaaclab_newton/test/physics/test_newton_manager_abstraction.py./isaaclab.sh -p -m pytest source/isaaclab_newton/test/physics/test_sdf_config.py./isaaclab.sh -fIsaac-Factory-NutThread-Direct-v0 presets=newtonStandalone regression tests for the two new commits — single-body kinematic
RigidObjectCfground-trip andsanitize_world_stateNaN-recovery — are open follow-ups to keep this PR scoped to vendoring + the minimal API additions the Factory branch depends on.Checklist
pre-commitchecks with./isaaclab.sh --formatsource/<pkg>/changelog.d/for every touched packageCONTRIBUTORS.mdor my name already exists there