Fix DeepCompile ZeRO-1 grad target lifetime by tohtana · Pull Request #8036 · deepspeedai/DeepSpeed

tohtana · 2026-05-29T21:16:20Z

DeepCompile ZeRO-1 kept compile-time reduce target buffers alive into the optimizer step, causing backward gradient storage to overlap with optimizer temporaries. This PR fixes the issue by making DeepCompile ZeRO-1 reduce targets follow the normal step-local ZeRO partition gradient-buffer lifetime, instead of preserving cloned target storage from compile setup.

The actual code changes are:

During compile initialization, register empty DeepCompile ZeRO-1 gradient targets, then bind them to the step-local flat ZeRO partition gradient buffer and per-parameter views when gradients are ready to synchronize.
After ZeRO-1 builds the optimizer-facing fp32 gradient partition, release the DeepCompile registry references and clear reduce bucket storage after backward synchronization.

	Step 10-30 avg sec	Peak alloc GiB
Without this PR	0.858	43.594
Without this PR	0.859	39.366

Fine-tuning style training (8xH100, Qwen3-8B random weights, bs/GPU=1, seq=4096, GAS=1) showed only finite value losses for 1000 steps.

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1947a25a80

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

…1-grad-buffer-protocol Use generic flat grad partition protocol

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f98f7290c0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-02T19:27:29Z

+    backward_graph_ids = [g_id for g_id, needs_backward in graph_order if needs_backward]
+    return not backward_graph_ids or graph_id == backward_graph_ids[0]


Release buckets on the last reducing backward graph

When the first graph in forward order needs backward but has no parameter reductions (for example a graph-break segment that only computes activation/input gradients), this helper marks that no-reduce graph as the one that should clear the shared buckets; however add_end_backward() immediately returns for graphs with no reduce_grad nodes, so no dc.end_backward(..., release_reduce_buckets=True) is inserted anywhere. Any earlier backward graph that did reduce gradients will flush/synchronize but keep reduce_buckets allocated into the optimizer step, defeating the lifetime fix in these multi-graph cases. Please base the release decision on the last backward graph that actually has reduce nodes, or otherwise insert a final cleanup even for no-reduce backward graphs.

Useful? React with 👍 / 👎.

eternalNight · 2026-06-04T09:47:07Z

+                if idx in optimizer._deepcompile_z1_current_grad_buffers:
+                    optimizer._deepcompile_z1_current_grad_buffers[idx] = None
+
+        optimizer._deepcompile_z1_release_grad_buffers = release_grad_buffer


I may have missed something, but I don't find any reference to optimizer._deepcompile_z1_release_grad_buffers in the latest changes. Is it a dangling change after the refinement since the first draft?

Similarly, optimizer._deepcompile_z1_grad_buffer_metadata seems out-dated as well. Meanwhile, optimizer._deepcompile_z1_current_grad_buffers is still used by the unit tests.

Fix DeepCompile ZeRO-1 grad target lifetime

1947a25

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

tohtana marked this pull request as ready for review May 29, 2026 22:08

tohtana requested review from loadams and tjruwase as code owners May 29, 2026 22:08

tohtana requested a review from eternalNight May 29, 2026 22:08

chatgpt-codex-connector Bot reviewed May 29, 2026

View reviewed changes

Comment thread csrc/includes/deepcompile.h Outdated

tohtana marked this pull request as draft May 29, 2026 23:40

tohtana and others added 6 commits May 29, 2026 23:28

Release DeepCompile reduce buckets after final graph

644caa2

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

Remove added DeepCompile test

eb018f3

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

Merge branch 'master' into tohtana/fix/deepcompile-zero1-flat-release

bbaae93

Merge branch 'master' into tohtana/fix/deepcompile-zero1-flat-release

c0bd643

Use generic flat grad partition protocol

3524e52

Signed-off-by: Masahiro Tanaka <mtanaka@anyscale.com>

Merge pull request #26 from tohtana/tohtana/refactor-deepcompile-zero…

f98f729

…1-grad-buffer-protocol Use generic flat grad partition protocol

tohtana marked this pull request as ready for review June 2, 2026 19:23

chatgpt-codex-connector Bot reviewed Jun 2, 2026

View reviewed changes

eternalNight reviewed Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DeepCompile ZeRO-1 grad target lifetime#8036

Fix DeepCompile ZeRO-1 grad target lifetime#8036
tohtana wants to merge 7 commits into
deepspeedai:masterfrom
tohtana:tohtana/fix/deepcompile-zero1-flat-release

tohtana commented May 29, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Uh oh!

eternalNight Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		backward_graph_ids = [g_id for g_id, needs_backward in graph_order if needs_backward]
		return not backward_graph_ids or graph_id == backward_graph_ids[0]

Conversation

tohtana commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

eternalNight Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tohtana commented May 29, 2026 •

edited

Loading