Skip to content

Fetch from nvidia Megatron-LM#5

Open
RaymondLi0 wants to merge 7227 commits intoElementAI:load-iterfrom
NVIDIA:main
Open

Fetch from nvidia Megatron-LM#5
RaymondLi0 wants to merge 7227 commits intoElementAI:load-iterfrom
NVIDIA:main

Conversation

@RaymondLi0
Copy link
Copy Markdown

No description provided.

shjwudp and others added 29 commits April 2, 2026 15:11
Co-authored-by: Cory Ye <44509866+cspades@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
…4114)

Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Co-authored-by: Cory Ye <44509866+cspades@users.noreply.github.com>
Signed-off-by: Akshat Kumar <akshat230405@gmail.com>
…#4084)

Signed-off-by: Deyu Fu <deyuf@nvidia.com>
Co-authored-by: Tom Long <tolong@oci-hsg-cs-001-vscode-02.cm.cluster>
Co-authored-by: yaoyu-33 <yaoyu.094@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Deyu Fu <deyuf@nvidia.com>
…dance (#4035)

Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>
Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…4140)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dlock (#4139)

Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: meg miranda <mmiranda@nvidia.com>
…ubscriptable`) by not saving a checkpoint after a transient NaN / Inf (#3981)

Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Signed-off-by: Hao Wu <skyw@nvidia.com>
Signed-off-by: Cory Ye <cye@nvidia.com>
Co-authored-by: Cory Ye <cye@nvidia.com>
Co-authored-by: conver334 <conver334@gmail.com>
tdene and others added 30 commits May 2, 2026 19:17
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Co-authored-by: Keshav Santhanam <ksanthanam@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: root <root@nvl72114-T04.cm.cluster>
Co-authored-by: root <root@nvl72116-T13.cm.cluster>
Co-authored-by: root <root@nvl72102-T01.cm.cluster>
Co-authored-by: root <root@nvl72150-T11.cm.cluster>
Co-authored-by: root <root@nvl72045-T09.cm.cluster>
Co-authored-by: root <root@nvl72102-T04.cm.cluster>
Signed-off-by: Keshav Santhanam <ksanthanam@nvidia.com>
Co-authored-by: Siddharth Singh <sidsingh@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Asha Anoosheh <aanoosheh@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: Maanu Grover <maanug@nvidia.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: guihong-nv <guihongl@nvidia.com>
Co-authored-by: Maanu Grover <maanug@nvidia.com>
Co-authored-by: Deepak Narayanan <deepakn94@gmail.com>
Co-authored-by: Deepak Narayanan <dnarayanan@nvidia.com>
Co-authored-by: Philip Petrakian <ppetrakian@nvidia.com>
Co-authored-by: Xin Yao <xiny@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Keshav Santhanam <ksanthanam@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: pengdurice <pengduhit@gmail.com>
… and re-simplify ep_sync accidentally reverted by #4306 (#4587)

Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
Co-authored-by: Antoni-Joan Solergibert <asolergibert@nvidia.com>
Co-authored-by: Maanu Grover <maanug@nvidia.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: dimapihtar <dpykhtar@nvidia.com>
Co-authored-by: Siddharth Singh <sidsingh@nvidia.com>
Co-authored-by: root <root@nvl72098-T11.cm.cluster>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: root <root@nvl72078-T18.cm.cluster>
Co-authored-by: William Dykas <wdykas@oci-hsg-cs-001-vscode-03.cm.cluster>
Co-authored-by: root <root@nvl72102-T05.cm.cluster>
Signed-off-by: meg miranda <mmiranda@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…n_dev (#4639)

Signed-off-by: oliver könig <okoenig@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.