Fix GPU predictor kernel stride for multi-sample TIFFs#1222
Merged
brendancol merged 1 commit intomasterfrom Apr 19, 2026
Merged
Fix GPU predictor kernel stride for multi-sample TIFFs#1222brendancol merged 1 commit intomasterfrom
brendancol merged 1 commit intomasterfrom
Conversation
Both call sites of _predictor_decode_kernel passed width=tile_width*samples and bytes_per_sample=itemsize*samples. The kernel computes row_bytes = width * bytes_per_sample, so row_bytes ended up tile_width * samples**2 * itemsize instead of tile_width * samples * itemsize. The inner loop walked past the end of each tile row and, on the last tile, past the end of the d_decomp buffer (OOB GPU write). Fix: pass width=tile_width at both call sites. bytes_per_sample stays itemsize*samples. Matches the CPU call convention in _reader._apply_predictor. Regression test builds tiled multi-sample TIFFs (RGB/RGBA, uint8/uint16, even and uneven tile grids) with predictor=2, decodes via the GPU path, and asserts byte-for-byte equality with the CPU decode.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #1220.
Both call sites of
_predictor_decode_kernelpassedwidth = tile_width * samplesandbytes_per_sample = itemsize * samples. The kernel body doesrow_bytes = width * bytes_per_sample, sorow_bytescame out totile_width * samples**2 * itemsizeinstead oftile_width * samples * itemsize.For tiled multi-sample TIFFs with predictor=2, that meant the cumulative-sum loop walked
samplestimes further per row than the row actually contained. On the last tile in the buffer it wrote past the end ofd_decompitself, so it's an OOB GPU write with no error surface.To reproduce, any tiled TIFF with
SamplesPerPixel > 1(RGB, RGBA) and predictor=2, read viaopen_geotiff(path, gpu=True)orread_geotiff_gpu, produces pixels that don't match the CPU decode.The fix is to pass
width = tile_widthat both call sites.bytes_per_sample = itemsize * samplesstays. No kernel change. This matches what the CPU path already does at_reader._apply_predictor(..., bytes_per_sample * samples).The new test file
xrspatial/geotiff/tests/test_predictor_multisample.pybuilds tiled multi-sample TIFFs (RGB/RGBA, uint8/uint16, even and uneven tile grids) with predictor=2, decodes on the GPU, and checks byte-for-byte equality with the CPU decode. I confirmed the tests fail on the unpatched code and pass after the fix by stashing the change and re-running.Test plan:
pytest xrspatial/geotiff/tests/test_predictor_multisample.pyon a box with CuPy + CUDA.pytest xrspatial/geotiff/tests/clean apart from the pre-existing matplotlib palette recursion failures.One thing I noticed but didn't fix here: the predictor=3 path has its own argument pattern (
bytes_per_sample=itemsize, no* samples) that also looks wrong for multi-sample data. Out of scope for this PR.