Skip to content

Give workgroup barriers their memory-fence flags#587

Draft
michel2323 wants to merge 1 commit into
mainfrom
syncthreads-memory-fence
Draft

Give workgroup barriers their memory-fence flags#587
michel2323 wants to merge 1 commit into
mainfrom
syncthreads-memory-fence

Conversation

@michel2323

Copy link
Copy Markdown
Member

Summary

barrier(0) lowers to an OpControlBarrier with SequentiallyConsistent semantics but no storage-class bit, which the SPIR-V spec treats as ordering no memory. As a result, shared-local (and global) writes are not guaranteed visible to other work-items after the barrier, which can silently drop updates — e.g. a workgroup local-atomic accumulation losing counts.

This passes the appropriate fence flags so the barrier actually orders memory:

  • KA.__synchronize()barrier(LOCAL_MEM_FENCE | GLOBAL_MEM_FENCE), matching CUDA __syncthreads semantics (src/oneAPIKernels.jl).
  • the reduce_group shared-memory reduction tree → barrier(LOCAL_MEM_FENCE) (src/mapreduce.jl).

Notes

Latent correctness issue independent of the GPU stack (barrier(0) orders no memory on any conforming runtime). Verified the LOCAL_MEM_FENCE/GLOBAL_MEM_FENCE constants exist in SPIRVIntrinsics 1.0.0 (the version main pins).

🤖 Generated with Claude Code

`barrier(0)` lowers to an `OpControlBarrier` with `SequentiallyConsistent`
semantics but no storage-class bit, which the SPIR-V spec treats as
ordering no memory. So shared-local (and global) writes are not
guaranteed visible to other work-items after the barrier, which can
silently drop updates (e.g. a workgroup local-atomic accumulation losing
counts).

Pass the appropriate fence flags so the barrier actually orders memory:
`LOCAL_MEM_FENCE | GLOBAL_MEM_FENCE` for KA `@synchronize` (matching CUDA
`__syncthreads`), and `LOCAL_MEM_FENCE` for the mapreduce reduce_group
shared-memory tree.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.89%. Comparing base (e995a63) to head (b24db59).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #587      +/-   ##
==========================================
- Coverage   80.92%   80.89%   -0.04%     
==========================================
  Files          48       48              
  Lines        3234     3234              
==========================================
- Hits         2617     2616       -1     
- Misses        617      618       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant