Skip to content

HDDS-15076. Fix Incorrect pending deletion size for EC blocks in DN#10105

Open
priyeshkaratha wants to merge 3 commits intoapache:masterfrom
priyeshkaratha:HDDS-15076
Open

HDDS-15076. Fix Incorrect pending deletion size for EC blocks in DN#10105
priyeshkaratha wants to merge 3 commits intoapache:masterfrom
priyeshkaratha:HDDS-15076

Conversation

@priyeshkaratha
Copy link
Copy Markdown
Contributor

@priyeshkaratha priyeshkaratha commented Apr 22, 2026

What changes were proposed in this pull request?

Problem
For EC blocks, each DataNode holds only one shard but each DN was crediting itself with the full logical block size as its pending-deletion contribution. This inflated cluster-wide pending deletion metrics by a factor of roughly (k+m)/m (e.g. 3× for EC(3,2) or EC(6,3)).

Why this approximation is acceptable
Dividing the total replicated size evenly across all nodes is not byte-perfect, the last partial stripe means some data shards are smaller than others.
However the cluster-wide sum across all DNs still equals totalReplicatedSize, so aggregate metrics are correct.
The per-DN error is at most ±1 ecChunkSize per block and this is negligible at container scale.

What is the link to the Apache JIRA

HDDS-15076

How was this patch tested?

Tested using existing testcases and additional unit testcase.

Copy link
Copy Markdown
Contributor

@sreejasahithi sreejasahithi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @priyeshkaratha for the patch

@ChenSammi
Copy link
Copy Markdown
Contributor

@priyeshkaratha , it should be fixed in datanode. The current changes in SCM will make the SCM report incorrect.

@priyeshkaratha priyeshkaratha force-pushed the HDDS-15076 branch 3 times, most recently from faec2d0 to 88f769a Compare April 22, 2026 23:05
@priyeshkaratha
Copy link
Copy Markdown
Contributor Author

@priyeshkaratha , it should be fixed in datanode. The current changes in SCM will make the SCM report incorrect.

I have updated PR to do complete logic inside DN. To calculated most aprox value, DN required number of nodes required which has to be passed from SCM side. Also added integration test to validate the same.

@priyeshkaratha priyeshkaratha marked this pull request as ready for review April 23, 2026 03:57
@ChenSammi ChenSammi requested review from Copilot and removed request for sreejasahithi April 28, 2026 06:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes inflated “pending deletion size” accounting for erasure-coded (EC) blocks by introducing a per-replica/per-shard size signal into the delete-blocks flow so DataNodes attribute only their shard’s share rather than the full logical block size.

Changes:

  • Extend deleted-block metadata/protos to carry per-replica size (and aggregate it into SCM delete transactions).
  • Update OM to populate per-replica size for blocks pending deletion via QuotaUtil.getSizePerReplica(...).
  • Add/adjust unit + integration tests for quota sizing and Recon storage distribution / pending deletion endpoints (RATIS + EC).

Reviewed changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/service/TestKeyDeletingService.java Update DeletedBlock constructor usage to include per-replica size.
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OmMetadataManagerImpl.java Populate DeletedBlock with per-replica size when building delete block groups.
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/KeyManagerImpl.java Populate DeletedBlock with per-replica size for pending key deletions.
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/shell/TestDeletedBlocksTxnShell.java Update test data generation for new DeletedBlock signature.
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/upgrade/TestScmDataDistributionFinalization.java Update test deleted-block generation for per-replica size.
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/hdds/scm/TestStorageContainerManager.java Update SCM integration tests for new DeletedBlock signature.
hadoop-ozone/integration-test-recon/src/test/java/org/apache/hadoop/ozone/recon/TestStorageDistributionEndpointRatis.java New Recon integration test for RATIS replication behavior and DN failure reporting.
hadoop-ozone/integration-test-recon/src/test/java/org/apache/hadoop/ozone/recon/TestStorageDistributionEndpointEC.java New Recon integration test for EC replication behavior.
hadoop-ozone/integration-test-recon/src/test/java/org/apache/hadoop/ozone/recon/AbstractTestStorageDistributionEndpoint.java Refactor shared Recon integration test infrastructure into an abstract base class.
hadoop-ozone/common/src/test/java/org/apache/hadoop/ozone/om/helpers/TestQuotaUtil.java Add unit tests for new per-replica sizing helper across RATIS + EC.
hadoop-ozone/common/src/main/java/org/apache/hadoop/ozone/om/helpers/QuotaUtil.java Add getSizePerReplica(...) helper for per-DN physical sizing.
hadoop-hdds/server-scm/src/test/java/org/apache/hadoop/hdds/scm/block/TestDeletedBlockLog.java Update SCM deleted-block-log tests for per-replica totals in transactions.
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java Populate totalSizePerReplica in SCM delete transactions when available.
hadoop-hdds/interface-server/src/main/proto/ScmServerProtocol.proto Extend KeyBlocks proto with sizePerReplica.
hadoop-hdds/interface-server/src/main/proto/ScmServerDatanodeHeartbeatProtocol.proto Extend DeletedBlocksTransaction proto with totalSizePerReplica.
hadoop-hdds/framework/src/main/java/org/apache/hadoop/ozone/common/DeletedBlock.java Extend DeletedBlock model to carry per-replica size.
hadoop-hdds/framework/src/main/java/org/apache/hadoop/ozone/common/BlockGroup.java Serialize/deserialize per-replica size in KeyBlocks proto.
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/DeleteBlocksCommandHandler.java Use per-replica transaction totals when updating DN pending deletion bytes.
Comments suppressed due to low confidence (1)

hadoop-ozone/integration-test-recon/src/test/java/org/apache/hadoop/ozone/recon/AbstractTestStorageDistributionEndpoint.java:200

  • The inline comment on setDataSize(100) is now incorrect (it still says the replicated size is 10 * 3 = 30). Please update or remove it so the comment matches the new 100-byte test payload and the updated assertions (e.g., 100 * 3 = 300 for RATIS THREE).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 651 to 655
if (VersionedDatanodeFeatures.isFinalized(
HDDSLayoutFeature.STORAGE_SPACE_DISTRIBUTION) && delTX.hasTotalBlockSize()) {
long pendingBytes = containerData.getBlockPendingDeletionBytes();
pendingBytes += delTX.getTotalBlockSize();
pendingBytes += delTX.getTotalSizePerReplica();
metadataTable
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Comment on lines 660 to 662
containerData.incrPendingDeletionBlocks(newDeletionBlocks,
delTX.hasTotalBlockSize() ? delTX.getTotalBlockSize() : 0);
delTX.hasTotalBlockSize() ? delTX.getTotalSizePerReplica() : 0);
containerData.updateDeleteTransactionId(delTX.getTxID());
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

* @return bytes stored on a single DN for this block
*/
public static long getSizePerReplica(long dataSize, ReplicationConfig repConfig) {
int requiredNodes = repConfig.getRequiredNodes();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's check REPLICATION_TYPE first. For Ratis, there is no need to go through the calculation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants