Skip to content

Chore: refactor common utils#5504

Open
anyangml wants to merge 8 commits into
deepmodeling:masterfrom
anyangml:chore/refactor-compile
Open

Chore: refactor common utils#5504
anyangml wants to merge 8 commits into
deepmodeling:masterfrom
anyangml:chore/refactor-compile

Conversation

@anyangml
Copy link
Copy Markdown
Collaborator

@anyangml anyangml commented Jun 8, 2026

Summary by CodeRabbit

  • Refactor

    • Moved compile/tracing helpers into a shared utilities module to reduce duplication.
    • Training trace path now uses shared helpers and selects a “safe prime” frame size, padding only necessary dimensions.
  • Bug Fixes

    • Improved FX detach cleanup and graph rebuilding for more stable tracing and compilation (fewer symbolic-shape and re-trace failures).

Copilot AI review requested due to automatic review settings June 8, 2026 03:16
@github-actions github-actions Bot added the Python label Jun 8, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 8, 2026

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new shared compile_utils module (prime helpers, trace-time padding, FX detach-strip, FX rebuild) and updates SeZMModel and training trace/compile to import and use these helpers; training now selects a forbidden-set-aware safe-prime trace frame size and pads frame-only inputs before make_fx and compile.

Changes

Compile Utilities and Shape Specialization

Layer / File(s) Summary
Compile utilities library
deepmd/pt/utils/compile_utils.py
New module provides prime helpers (_is_prime, _next_safe_prime with forbidden-set filtering), tensor padding/trimming (_trace_pad_dim), FX graph topology-aware detach-chain removal (strip_saved_tensor_detach), and stale-pointer-safe graph rebuilding (rebuild_graph_module).
SeZMModel compile-utils integration
deepmd/pt/model/model/sezm_model.py
Replaces in-file compile helper implementations with imports from compile_utils; removes duplicate prime selection, tensor padding, detach-chain stripping, and graph-rebuild code.
Training pipeline prime-based shape specialization
deepmd/pt_expt/train/training.py
Imports compile utilities and replaces fixed nframes=7 trace padding with dynamic safe-prime-based frame-dimension selection using a forbidden-set derived from model params/buffers and runtime dims; pads/clamps trace inputs via _trace_pad_dim, then uses strip_saved_tensor_detach and rebuild_graph_module before torch.compile.

Sequence Diagram (trace & compile flow)

sequenceDiagram
  participant Trainer
  participant make_fx
  participant DetachFix as strip_saved_tensor_detach
  participant Rebuilder as rebuild_graph_module
  participant Compiler as torch.compile

  Trainer->>make_fx: call make_fx(forward_lower) with padded inputs
  make_fx->>DetachFix: produce traced_lower GraphModule
  DetachFix->>Rebuilder: remove saved-tensor detaches and rewrite uses
  Rebuilder->>Compiler: rebuild graph and return clean GraphModule
  Compiler->>Trainer: compile(GraphModule, dynamic=True, backend="inductor")
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • deepmodeling/deepmd-kit#5483: Also touches SeZM/tracing machinery and safe-prime trace-shape selection/padding helpers used during make_fx/compile.
  • deepmodeling/deepmd-kit#5457: Modifies the training trace_and_compile/_CompiledModel paths; overlaps in the compile/tracing area and compiled forward_lower handling.

Suggested reviewers

  • wanghan-iapcm
  • njzjz-bot
  • OutisLi
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Chore: refactor common utils' is vague and generic, using non-descriptive terms that don't convey meaningful information about the specific changes made in the pull request. Consider a more specific title that describes the actual refactoring, such as 'Extract compile utilities to shared module' or 'Consolidate FX graph and tensor padding utilities'.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread deepmd/pt_expt/train/training.py Dismissed
Comment thread deepmd/pt_expt/train/training.py Fixed
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
deepmd/pt/utils/compile_utils.py (1)

87-88: 💤 Low value

Consider adding a type guard for extra robustness.

While aten.detach.default inputs are always Nodes in make_fx-generated graphs, adding an isinstance check would prevent potential AttributeError if this helper is ever called on malformed graphs.

🛡️ Optional defensive check
     def _is_detach(n: torch.fx.Node) -> bool:
-        return n.op == "call_function" and n.target == _DETACH
+        return isinstance(n, torch.fx.Node) and n.op == "call_function" and n.target == _DETACH
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/pt/utils/compile_utils.py` around lines 87 - 88, Update _is_detach to
defensively check that the input is a torch.fx.Node before accessing attributes:
change the parameter type to a more permissive Any (or keep current) and add an
isinstance(n, torch.fx.Node) guard so you only evaluate n.op and n.target when n
is a Node; optionally change the return annotation to
typing.TypeGuard[torch.fx.Node] if you want an actual type guard. Ensure the
function still checks n.op == "call_function" and n.target == _DETACH after the
isinstance check and reference the _is_detach name, torch.fx.Node, and _DETACH
in your change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@deepmd/pt/utils/compile_utils.py`:
- Around line 87-88: Update _is_detach to defensively check that the input is a
torch.fx.Node before accessing attributes: change the parameter type to a more
permissive Any (or keep current) and add an isinstance(n, torch.fx.Node) guard
so you only evaluate n.op and n.target when n is a Node; optionally change the
return annotation to typing.TypeGuard[torch.fx.Node] if you want an actual type
guard. Ensure the function still checks n.op == "call_function" and n.target ==
_DETACH after the isinstance check and reference the _is_detach name,
torch.fx.Node, and _DETACH in your change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 4c2615d5-fadf-4031-a417-656104bcad39

📥 Commits

Reviewing files that changed from the base of the PR and between 99c1ece and 4b314b0.

📒 Files selected for processing (3)
  • deepmd/pt/model/model/sezm_model.py
  • deepmd/pt/utils/compile_utils.py
  • deepmd/pt_expt/train/training.py

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors PyTorch tracing/compilation helper logic into a shared utility module so both the SeZM model compile path and the pt_expt training compile path reuse the same implementations.

Changes:

  • Added deepmd/pt/utils/compile_utils.py with shared helpers for trace-time prime shape selection, input padding/trimming, detach-chain stripping, and FX graph rebuilding.
  • Updated deepmd/pt/model/model/sezm_model.py to import and use the shared helpers, removing the previous in-file implementations.
  • Updated deepmd/pt_expt/train/training.py to import and use the shared helpers and to coerce multiple trace-time dimensions (nf/nloc/nall) to collision-resistant primes.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
deepmd/pt/utils/compile_utils.py New shared tracing/compile utility module (prime sizing, padding, detach stripping, graph rebuilding).
deepmd/pt/model/model/sezm_model.py Replaced local helper implementations with imports from the new shared module.
deepmd/pt_expt/train/training.py Replaced local FX graph post-processing helpers with shared imports; updated trace-input coercion to prime sizes for multiple dims.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 8, 2026

Codecov Report

❌ Patch coverage is 91.66667% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.07%. Comparing base (99c1ece) to head (d8f9cd0).

Files with missing lines Patch % Lines
deepmd/pt_expt/train/training.py 85.96% 8 Missing ⚠️
deepmd/pt/utils/compile_utils.py 96.66% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5504      +/-   ##
==========================================
- Coverage   81.41%   81.07%   -0.35%     
==========================================
  Files         871      872       +1     
  Lines       96952    96952              
  Branches     4242     4241       -1     
==========================================
- Hits        78938    78605     -333     
- Misses      16711    17044     +333     
  Partials     1303     1303              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread deepmd/pt_expt/train/training.py Dismissed
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/pt_expt/train/training.py`:
- Line 395: The comment in training.py contains EN DASH characters in the phrase
"50–500 and 200–5000+" which triggers Ruff RUF003; update that comment (around
the block where the line mentions real data counts) to use HYPHEN-MINUS instead,
i.e. change "50–500 and 200–5000+" to "50-500 and 200-5000+", ensuring no other
EN DASH characters remain in the same comment or nearby comments.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d06cb1c3-6798-4545-9084-600cf0f7aaab

📥 Commits

Reviewing files that changed from the base of the PR and between c0b66b7 and c7f9f7f.

📒 Files selected for processing (1)
  • deepmd/pt_expt/train/training.py

Comment thread deepmd/pt_expt/train/training.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants