Skip to content

HDDS-15051. Incorrect DN replica reporting for unhealthy and QUASI CLOSED stuck containers in Recon.#10101

Closed
devmadhuu wants to merge 5 commits into
apache:masterfrom
devmadhuu:HDDS-15051
Closed

HDDS-15051. Incorrect DN replica reporting for unhealthy and QUASI CLOSED stuck containers in Recon.#10101
devmadhuu wants to merge 5 commits into
apache:masterfrom
devmadhuu:HDDS-15051

Conversation

@devmadhuu
Copy link
Copy Markdown
Contributor

@devmadhuu devmadhuu commented Apr 21, 2026

What changes were proposed in this pull request?

This PR fixes incorrect datanode replica details returned by Recon for unhealthy containers by switching the unhealthy-container replicas[] response from Recon’s truncated replica history to SCM’s current replica set. The response still enriches SCM replicas with Recon history metadata like first-seen and last-seen timestamps when available, but SCM is now the source of truth for replica membership.

The change adds a new StorageContainerServiceProvider.getContainerReplicas(...) API, updates ContainerEndpoint to use it for unhealthy containers, and rewrites the affected Recon endpoint tests to validate SCM-backed behavior.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15051

How was this patch tested?

  • Updated TestContainerEndpoint to validate the new unhealthy-container behavior against SCM-backed replica responses instead of Recon-local replica history.
  • Added/adjusted assertions to cover:
    • over-replicated containers returning all current SCM replicas
    • under-replicated and mis-replicated containers returning SCM replica state/details
    • replica-mismatch containers preserving checksum validation with SCM-backed replicas
    • missing containers in /containers/unhealthy returning empty replicas[] when SCM has no current replicas
image
bash-5.1$ ozone admin container info 1
Container id: 1
Pipeline id: d09949f4-d6f1-43c6-8fed-c0f028c3f689
Write PipelineId: 8b888300-6957-4616-ac9b-22d00da202e6
Write Pipeline State: CLOSED
Container State: QUASI_CLOSED
SequenceId: 236
Datanodes: [bc882bae-79f8-4aa4-bd53-8b4e4c082128/ozone-datanode-4.ozone_default,
734a2fae-ade6-4be1-a82a-dcf2741f62af/ozone-datanode-3.ozone_default]
Replicas: [State: QUASI_CLOSED; ReplicaIndex: 0; SequenceId: 236; Origin: bc882bae-79f8-4aa4-bd53-8b4e4c082128; Location: bc882bae-79f8-4aa4-bd53-8b4e4c082128/ozone-datanode-4.ozone_default,
State: QUASI_CLOSED; ReplicaIndex: 0; SequenceId: 236; Origin: bc882bae-79f8-4aa4-bd53-8b4e4c082128; Location: 734a2fae-ade6-4be1-a82a-dcf2741f62af/ozone-datanode-3.ozone_default]
image
bash-5.1$ ozone admin container info 1
Container id: 1
Pipeline id: fb776384-9508-4a2c-90c4-a2b16b85ec5e
Write PipelineId: 8b888300-6957-4616-ac9b-22d00da202e6
Write Pipeline State: CLOSED
Container State: QUASI_CLOSED
SequenceId: 236
Datanodes: [bc882bae-79f8-4aa4-bd53-8b4e4c082128/ozone-datanode-4.ozone_default,
734a2fae-ade6-4be1-a82a-dcf2741f62af/ozone-datanode-3.ozone_default,
915cc957-eb1e-498d-982e-298cba67a332/ozone-datanode-1.ozone_default,
d0230185-6167-4e6a-97a6-092603a335d9/ozone-datanode-2.ozone_default]
Replicas: [State: QUASI_CLOSED; ReplicaIndex: 0; SequenceId: 236; Origin: bc882bae-79f8-4aa4-bd53-8b4e4c082128; Location: bc882bae-79f8-4aa4-bd53-8b4e4c082128/ozone-datanode-4.ozone_default,
State: QUASI_CLOSED; ReplicaIndex: 0; SequenceId: 236; Origin: bc882bae-79f8-4aa4-bd53-8b4e4c082128; Location: 734a2fae-ade6-4be1-a82a-dcf2741f62af/ozone-datanode-3.ozone_default,
State: QUASI_CLOSED; ReplicaIndex: 0; SequenceId: 236; Origin: d0230185-6167-4e6a-97a6-092603a335d9; Location: 915cc957-eb1e-498d-982e-298cba67a332/ozone-datanode-1.ozone_default,
State: QUASI_CLOSED; ReplicaIndex: 0; SequenceId: 236; Origin: d0230185-6167-4e6a-97a6-092603a335d9; Location: d0230185-6167-4e6a-97a6-092603a335d9/ozone-datanode-2.ozone_default]

Devesh Kumar Singh added 2 commits April 18, 2026 17:48
@devmadhuu devmadhuu requested a review from sumitagrawl April 28, 2026 14:15
@devmadhuu
Copy link
Copy Markdown
Contributor Author

Closing this because we have handled ICR logic gaps in #10074 for creating incorrect DN replicas.

@devmadhuu devmadhuu closed this May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant