Skip to content

HDDS-15105. Revise snapshot defrag design doc and add implementation details#10119

Open
smengcl wants to merge 11 commits intoapache:masterfrom
smengcl:HDDS-15105-revise-defrag-doc
Open

HDDS-15105. Revise snapshot defrag design doc and add implementation details#10119
smengcl wants to merge 11 commits intoapache:masterfrom
smengcl:HDDS-15105-revise-defrag-doc

Conversation

@smengcl
Copy link
Copy Markdown
Contributor

@smengcl smengcl commented Apr 24, 2026

Generated-by: GPT-5.5

What changes were proposed in this pull request?

Revise snapshot defrag design doc. And add implementation details. GPT-5.5 generated most and I tweaked some.

Current design doc was merged in #8514

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15105

How was this patch tested?

  • n/a

@smengcl smengcl added AI-gen documentation Improvements or additions to documentation labels Apr 24, 2026
@smengcl
Copy link
Copy Markdown
Contributor Author

smengcl commented Apr 24, 2026

I eyeballed it and looks good. Inaccuraties are corrected in the process. Need to at least give it another pass before this can be merged.

Copy link
Copy Markdown
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a complete rewrite and hard to know what exactly was added.

@smengcl
Copy link
Copy Markdown
Contributor Author

smengcl commented Apr 24, 2026

This looks like a complete rewrite and hard to know what exactly was added.

That's fair. I should convert it into a user doc

@smengcl
Copy link
Copy Markdown
Contributor Author

smengcl commented Apr 24, 2026

Uh oh, I just realized that design doc should be under hadoop-hdds/docs/content/design/ . But the current design doc is already merged under hadoop-hdds/docs/content/feature/ . When I try to move the original design doc to the correct place while adding the user feature doc, it made the diff more confusing. Let me fix that by splitting the change into two PRs.

@smengcl smengcl marked this pull request as ready for review April 24, 2026 23:49
Copilot AI review requested due to automatic review settings April 24, 2026 23:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR revises the Snapshot Defragmentation documentation to align the original design proposal with the current implementation details in Ozone (local YAML metadata, locking, on-disk layout, and service workflows).

Changes:

  • Document implemented snapshot-local YAML metadata (OmSnapshotLocalData) and how defrag state/versioning is tracked locally (not via Ratis SnapshotInfo).
  • Update the described locking, directory layout, diff computation paths, and purge/cleanup behavior to match current code.
  • Refresh the workflow and mermaid diagrams to reflect the implemented defrag flow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

structure and calls out where the implemented behavior differs from the initial
proposal. For the current user-facing behavior, configuration, and on-disk
layout, see
[Snapshot Defragmentation]({{< ref "feature/Snapshot-Defragmentation.md" >}}).
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Hugo ref points to feature/Snapshot-Defragmentation.md, but there is no such doc under hadoop-hdds/docs/content/feature/ (only SnapshotDefragmentation.md). This will render as a broken link; update the reference to an existing page (or add the missing target doc).

Suggested change
[Snapshot Defragmentation]({{< ref "feature/Snapshot-Defragmentation.md" >}}).
[Snapshot Defragmentation]({{< ref "feature/SnapshotDefragmentation.md" >}}).

Copilot uses AI. Check for mistakes.
Comment on lines +123 to +147
1. **Acquire the bootstrap read lock** and load `SnapshotInfo` plus local YAML.
2. **Create a temporary checkpoint** in `tmp_defrag`.
* If this is the first snapshot in the bucket path chain, checkpoint the
current snapshot.
* Otherwise, checkpoint the current version of the path previous snapshot.
3. **Drop non-incremental column families** from the temporary checkpoint. They
are reloaded from the current snapshot later.
4. **Defrag tracked tables** (`keyTable`, `fileTable`, `directoryTable`).
* For the first snapshot in the path chain, perform full defrag by deleting
ranges outside the bucket prefix and compacting the tracked tables.
* For later snapshots, compute incremental delta SSTs, write changed records
or tombstones into temporary SST files, and ingest those files.
5. **Acquire `SNAPSHOT_DB_CONTENT_LOCK`** for the current snapshot. Snapshot
reads and deep-clean writes follow the same DAG-based lock hierarchy.
6. **Reload non-incremental tables** from the current snapshot into the
checkpoint.
7. **Move the checkpoint directory** to
`db.snapshots/checkpointState/om.db-<snapshot_id>-<next_version>`.
8. **Update local YAML**, incrementing `version`, adding the new
`versionSstFileInfos` entry, and clearing `needsDefrag`.
9. **Delete older checkpoint directories** for the same snapshot under
`SNAPSHOT_DB_CONTENT_LOCK` and the snapshot DB cache write lock. Older YAML
version metadata is removed later by `OmSnapshotLocalDataManagerService`
only when no local version depends on it.
10. **Release `SNAPSHOT_DB_CONTENT_LOCK`**.
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Markdown list nesting looks inconsistent in the updated workflow section: the 10-step numbered list (starting with "1. Acquire the bootstrap read lock") is not indented under item "5. ### Snapshot Defragmentation Workflow". This will typically render as a separate top-level list (and can restart numbering unexpectedly). Consider indenting the sub-steps to be children of item 5, or convert item 5 into its own heading (not part of the outer numbered list) to avoid ambiguous rendering.

Suggested change
1. **Acquire the bootstrap read lock** and load `SnapshotInfo` plus local YAML.
2. **Create a temporary checkpoint** in `tmp_defrag`.
* If this is the first snapshot in the bucket path chain, checkpoint the
current snapshot.
* Otherwise, checkpoint the current version of the path previous snapshot.
3. **Drop non-incremental column families** from the temporary checkpoint. They
are reloaded from the current snapshot later.
4. **Defrag tracked tables** (`keyTable`, `fileTable`, `directoryTable`).
* For the first snapshot in the path chain, perform full defrag by deleting
ranges outside the bucket prefix and compacting the tracked tables.
* For later snapshots, compute incremental delta SSTs, write changed records
or tombstones into temporary SST files, and ingest those files.
5. **Acquire `SNAPSHOT_DB_CONTENT_LOCK`** for the current snapshot. Snapshot
reads and deep-clean writes follow the same DAG-based lock hierarchy.
6. **Reload non-incremental tables** from the current snapshot into the
checkpoint.
7. **Move the checkpoint directory** to
`db.snapshots/checkpointState/om.db-<snapshot_id>-<next_version>`.
8. **Update local YAML**, incrementing `version`, adding the new
`versionSstFileInfos` entry, and clearing `needsDefrag`.
9. **Delete older checkpoint directories** for the same snapshot under
`SNAPSHOT_DB_CONTENT_LOCK` and the snapshot DB cache write lock. Older YAML
version metadata is removed later by `OmSnapshotLocalDataManagerService`
only when no local version depends on it.
10. **Release `SNAPSHOT_DB_CONTENT_LOCK`**.
1. **Acquire the bootstrap read lock** and load `SnapshotInfo` plus local YAML.
2. **Create a temporary checkpoint** in `tmp_defrag`.
* If this is the first snapshot in the bucket path chain, checkpoint the
current snapshot.
* Otherwise, checkpoint the current version of the path previous snapshot.
3. **Drop non-incremental column families** from the temporary checkpoint. They
are reloaded from the current snapshot later.
4. **Defrag tracked tables** (`keyTable`, `fileTable`, `directoryTable`).
* For the first snapshot in the path chain, perform full defrag by deleting
ranges outside the bucket prefix and compacting the tracked tables.
* For later snapshots, compute incremental delta SSTs, write changed records
or tombstones into temporary SST files, and ingest those files.
5. **Acquire `SNAPSHOT_DB_CONTENT_LOCK`** for the current snapshot. Snapshot
reads and deep-clean writes follow the same DAG-based lock hierarchy.
6. **Reload non-incremental tables** from the current snapshot into the
checkpoint.
7. **Move the checkpoint directory** to
`db.snapshots/checkpointState/om.db-<snapshot_id>-<next_version>`.
8. **Update local YAML**, incrementing `version`, adding the new
`versionSstFileInfos` entry, and clearing `needsDefrag`.
9. **Delete older checkpoint directories** for the same snapshot under
`SNAPSHOT_DB_CONTENT_LOCK` and the snapshot DB cache write lock. Older YAML
version metadata is removed later by `OmSnapshotLocalDataManagerService`
only when no local version depends on it.
10. **Release `SNAPSHOT_DB_CONTENT_LOCK`**.

Copilot uses AI. Check for mistakes.
Comment thread hadoop-hdds/docs/content/feature/SnapshotDefragmentation.md Outdated
@jojochuang jojochuang added the snapshot https://issues.apache.org/jira/browse/HDDS-6517 label Apr 27, 2026
Copy link
Copy Markdown
Contributor

@jojochuang jojochuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few future sidenotes but the doc itself is good. Let's merge it.

Snapshots currently reside under `db.snapshots/checkpointState/` directory. The proposal introduces a `db.snapshots/checkpointStateDefragged/` directory for defragged snapshots. The directory format should be as follows:
Snapshots reside under the OM metadata directory in
`db.snapshots/checkpointState/`. The implementation does not use a separate
`checkpointStateDefragged` directory. Defragged versions are sibling
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, this is no longer used checkpointStateDefragged / OM_SNAPSHOT_CHECKPOINT_DEFRAGGED_DIR
consider removing it in the future.

The important YAML fields are `snapshotId`, `previousSnapshotId`, `version`,
`needsDefrag`, `versionSstFileInfos`, `dbTxSequenceNumber`,
`transactionInfo`, `lastDefragTime`, `checksum`, and `isSSTFiltered`.
`lastDefragTime` is serialized, but current defrag decisions are based on
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe somewhere in the code we want to mark lastDefragTime deprecated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AI-gen documentation Improvements or additions to documentation snapshot https://issues.apache.org/jira/browse/HDDS-6517

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants