Skip to content

fix(pt_expt): report average step time in the normal training log#5500

Merged
njzjz merged 2 commits into
deepmodeling:masterfrom
wanghan-iapcm:fix-pt-expt-step-time-log
Jun 8, 2026
Merged

fix(pt_expt): report average step time in the normal training log#5500
njzjz merged 2 commits into
deepmodeling:masterfrom
wanghan-iapcm:fix-pt-expt-step-time-log

Conversation

@wanghan-iapcm
Copy link
Copy Markdown
Collaborator

@wanghan-iapcm wanghan-iapcm commented Jun 7, 2026

Summary

The pt_expt trainer emitted a separate, always-on debug line on top of the normal progress message:

[...] DEEPMD INFO   Batch     100: total wall time = 18.41 s, eta = ...
[...] DEEPMD INFO   step=100  step_time=0.1801s        <-- extra debug line

This step=N step_time=...s line (leftover from #5397) is pt_expt-only (absent from pt/pd/jax) and step_time there was a single instantaneous sample of the last step, not an average.

This PR removes the standalone line and folds an average per-step time into the normal message:

Batch     100: total wall time = 18.41 s, step time = 0.1841 s, eta = ...

Changes

  • deepmd/loggers/training.py: format_training_message gains an optional step_time argument, rendered as step time = X.XXXX s between the wall time and eta. Default-off, so pt/pd/jax output is unchanged.
  • deepmd/pt_expt/train/training.py: drop the standalone step=... step_time=... log line and the per-step time.time() bracketing (t_start/t_end); report step_time = interval_wall_time / steps_since_last_log (steps counted exactly once across intervals via last_log_step).
  • source/tests/pt_expt/test_training.py: add TestFormatTrainingMessageStepTime covering the with/without step_time and eta branches.

Known limitations

  • Tests cover the formatter (format_training_message) across all branches; they do not drive the trainer loop, so the average computation itself is verified by reasoning + existing training smoke tests rather than an end-to-end assertion.
  • The first interval (step 1) averages over a single step that includes one-time startup/compile cost, so its reported step time is inflated vs steady state (expected).

Summary by CodeRabbit

  • New Features
    • Training logs optionally show an average step time (s/step) for the logging interval; when present it is displayed between wall time and ETA in the log line.
  • Tests
    • Added tests validating step-time formatting, omission when absent, zero-value handling, and correct ordering with ETA and current-time rendering.

@github-actions github-actions Bot added the Python label Jun 7, 2026
@wanghan-iapcm wanghan-iapcm requested a review from njzjz June 7, 2026 01:57
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 7, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 36ab4987-21ed-4be2-b152-d187703c1dbb

📥 Commits

Reviewing files that changed from the base of the PR and between 189aff8 and e2639d0.

📒 Files selected for processing (2)
  • deepmd/loggers/training.py
  • source/tests/pt_expt/test_training.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • deepmd/loggers/training.py
  • source/tests/pt_expt/test_training.py

📝 Walkthrough

Walkthrough

This PR extends training message logging with step time reporting. format_training_message gains an optional step_time parameter and conditionally appends an "avg = … s/step" segment. The training loop switches from per-step timing to interval-averaged step_time computed over log intervals. Tests validate formatting and ordering.

Changes

Step time reporting in training messages

Layer / File(s) Summary
Format training message contract and implementation
deepmd/loggers/training.py
format_training_message adds optional step_time: float | None. Docstring updated and message formatting now conditionally appends avg = {step_time:.4f} s/step when step_time is provided.
Training loop step time calculation and logging
deepmd/pt_expt/train/training.py
Adds last_log_step initialized from self.start_step; removes per-step t_start/t_end captures. Display logging computes interval_steps = max(1, display_step_id - last_log_step), step_time = interval_wall_time / interval_steps, updates last_log_time and last_log_step, and passes step_time into format_training_message when timing is enabled.
Test coverage for step time formatting
source/tests/pt_expt/test_training.py
Adds datetime and format_training_message imports and TestFormatTrainingMessageStepTime to assert omission/inclusion of step_time, zero-value handling, 4-decimal formatting, segment ordering (wall time → step time → eta), and eta rendering with current_time.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding average step time reporting to the training log output in pt_expt trainer.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

The pt_expt trainer emitted a separate, always-on `step=N step_time=...s`
debug line (leftover from deepmodeling#5397) in addition to the normal progress message,
where step_time was a single instantaneous sample of the last step.

Replace it by folding an average-per-step time into the standard progress
message: `format_training_message` gains an optional `step_time` argument
(rendered as `step time = X.XXXX s`, default-off so pt/pd/jax are unaffected),
and the trainer now reports interval_wall_time / steps_since_last_log.  The
per-step time.time() bracketing (t_start/t_end) is dropped from the hot loop.

Tests covering the with/without step_time and eta branches are added to
source/tests/pt_expt/test_training.py.
@wanghan-iapcm wanghan-iapcm force-pushed the fix-pt-expt-step-time-log branch from 3e14909 to 189aff8 Compare June 7, 2026 02:03
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.42%. Comparing base (2e3117e) to head (e2639d0).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5500      +/-   ##
==========================================
- Coverage   81.42%   81.42%   -0.01%     
==========================================
  Files         871      871              
  Lines       96951    96951              
  Branches     4240     4245       +5     
==========================================
- Hits        78941    78939       -2     
- Misses      16707    16710       +3     
+ Partials     1303     1302       -1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Use an explicit per-step rate unit (avg = 0.1841 s/step) instead of the
redundant 'step time = 0.1841 s', keeping the interval 'total wall time'
unchanged.
@njzjz njzjz enabled auto-merge June 8, 2026 03:01
@njzjz njzjz added this pull request to the merge queue Jun 8, 2026
Merged via the queue into deepmodeling:master with commit e05af21 Jun 8, 2026
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants