Fix EC striped reads: correct slice offsets in stripe buffer and drain futures#76
Draft
lgbo-ustc wants to merge 1 commit intoClickHouse:masterfrom
Draft
Fix EC striped reads: correct slice offsets in stripe buffer and drain futures#76lgbo-ustc wants to merge 1 commit intoClickHouse:masterfrom
lgbo-ustc wants to merge 1 commit intoClickHouse:masterfrom
Conversation
…n futures divideOneStripe: add bufBaseOffset and include it when computing addSlice() positions into the destination ByteBuffer. Reason: readOneStripe() may read a sub-range of one logical stripe; valid bytes lie in curStripeBuf at [stripeBufOffset, stripeLimit) where stripeBufOffset is offsetInBlockGroup % stripeLen. Previously slices were laid out from buffer offset 0, while copyToTarget() consumed from stripeBufOffset, so applications read wrong bytes (zeros or stale data). That broke format parsers that depend on exact byte streams (e.g. Parquet footer magic). StripedInputStreamImpl: pass stripeBufOffset into divideOneStripe() for sequential reads; pass 0 in fetchBlockByteRange() where the buffer holds exactly the requested byte range. resetCurStripeBuffer: after clear(), memset the full capacity of curStripeBuf and parityBuf when allocated so EC logical holes (ALLZERO / tail padding) do not leak bytes from a prior stripe. StripingCell: fix constructor and init() parameter shadowing (ecPolicy and idxInBlkGroup were self-assigned and never stored in members). StripeReader::readStripe: remove the early break that cleared futures once fetchedChunksNum reached dataBlkNum. Parity chunk reads and decode retry paths enqueue additional futures and must all complete before leaving the loop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
problem
modification
divideOneStripe: add bufBaseOffset and include it when computing addSlice() positions into the destination ByteBuffer.
Reason: readOneStripe() may read a sub-range of one logical stripe; valid bytes lie in curStripeBuf at [stripeBufOffset, stripeLimit) where stripeBufOffset is offsetInBlockGroup % stripeLen. Previously slices were laid out from buffer offset 0, while copyToTarget() consumed from stripeBufOffset, so applications read wrong bytes (zeros or stale data). That broke format parsers that depend on exact byte streams (e.g. Parquet footer magic).
StripedInputStreamImpl: pass stripeBufOffset into divideOneStripe() for sequential reads; pass 0 in fetchBlockByteRange() where the buffer holds exactly the requested byte range.
resetCurStripeBuffer: after clear(), memset the full capacity of curStripeBuf and parityBuf when allocated so EC logical holes (ALLZERO / tail padding) do not leak bytes from a prior stripe.
StripingCell: fix constructor and init() parameter shadowing (ecPolicy and idxInBlkGroup were self-assigned and never stored in members).
StripeReader::readStripe: remove the early break that cleared futures once fetchedChunksNum reached dataBlkNum. Parity chunk reads and decode retry paths enqueue additional futures and must all complete before leaving the loop.