Separate query and storage alignment requirements [MOD-13837]#946
Conversation
Split the single alignment hint carried by PreprocessorsContainerAbstract
into query_alignment and storage_alignment so different SIMD kernels
(e.g. SQ8 storage vs FP32 query) can declare different alignment needs.
- spaces/spaces.h: add combineAlignments() helper and document the
GetDistFunc asymmetric-types contract (alignment hint refers to the
first / storage operand only).
- spaces/IP_space.cpp, spaces/L2_space.cpp: SQ8<->FP32 and SQ8<->SQ8
dispatchers now publish per-kernel alignment hints.
- spaces/computer/preprocessors.h: PreprocessorInterface preprocess()
and preprocessForStorage() carry storage_alignment / query_alignment.
CosinePreprocessor uses combineAlignments() for the shared-buffer
path. Removed the same-size preprocess() overload from the interface.
- spaces/computer/preprocessor_container.{h,cpp}: split alignment field
into query_alignment and storage_alignment; legacy single-arg
constructors retained as wrappers. Added getQueryAlignment() /
getStorageAlignment(); getAlignment() kept as a deprecated alias
returning storage_alignment.
- index_factories/components/preprocessors_factory.h: extend
PreprocessorsContainerParams and CreatePreprocessorsContainer overloads
to accept both alignments; single-alignment overloads kept for callers
that don't need the distinction.
- vec_sim_index.h: DataBlocksContainer now built with
getStorageAlignment(); added getQueryAlignment() / getStorageAlignment()
accessors.
- tests/unit/test_components.cpp, tests/unit/test_hnsw_tiered.cpp:
updated mock preprocessors and fixtures to the new interface.
- tests/unit/test_spaces.cpp: SelfDistanceCosine / SelfDistanceL2 for
SQ8<->FP32 now disable AVX2_FMA / AVX2 / SSE4 before the
no-optimization assertion, mirroring the rest of the spaces tests.
…ests QuantPreprocessor::preprocessForStorage() now allocates the quantized storage blob with allocate_aligned() when storage_alignment != 0, so SQ8 storage actually honors the per-kernel alignment hint plumbed through PreprocessorsContainerAbstract. New tests: - PreprocessorsTest.QuantizationAsymmetricAlignment: drives the quant preprocessor with query_alignment=32 and storage_alignment=16 and asserts that preprocess(), preprocessForStorage(), and preprocessQuery() each return a pointer aligned to its respective hint. - SpacesTest.SQ8_FP32_DispatcherAlignmentHints / SQ8_SQ8_DispatcherAlignmentHints: assert the exact alignment-hint values published by the x86 SQ8 dispatchers (AVX512 -> 16/32, AVX2_FMA -> 8, AVX2 -> 8, SSE4 -> 4), plus the no-optimization 0-case, for IP / L2 / Cosine.
VecSimAllocator::allocate_aligned(size, 0) already falls through to allocate(size), so the helper's alignment ternary was redundant. Inline the four call sites to a direct allocate_aligned() call.
Remove the temporary getAlignment() forwarders from PreprocessorsContainerAbstract and VecSimIndexAbstract. All callers in test_bruteforce, test_hnsw_tiered, and test_allocator now use the explicit getStorageAlignment() accessor, which is what every site already meant (DataBlock allocation overhead, brute-force storage, homogeneous-alignment fixture).
🛡️ Jit Security Scan Results✅ No security findings were detected in this PR
Security scan by Jit
|
There was a problem hiding this comment.
Pull request overview
This PR threads separate query/storage alignment hints through the preprocessor pipeline so asymmetric kernels can request different SIMD alignments for stored vectors and queries. It updates the core preprocessor/container interfaces, publishes new SQ8 dispatcher hints, and adjusts unit tests to cover the new alignment behavior.
Changes:
- Split the old single alignment hint into
query_alignmentandstorage_alignmentacross preprocessors, containers, factories, and index accessors. - Updated cosine/quantization preprocessing paths and x86 SQ8 dispatchers to use the new alignment semantics.
- Refreshed unit tests to validate asymmetric quantization alignment and dispatcher hint values.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
tests/unit/test_spaces.cpp |
Adds SQ8 self-distance coverage for more x86 feature branches and new dispatcher-alignment hint tests. |
tests/unit/test_hnsw_tiered.cpp |
Updates test preprocessor overrides to the new alignment-aware interface and accessor. |
tests/unit/test_components.cpp |
Refactors dummy preprocessors/tests for separate query vs. storage alignment and adds asymmetric quantization coverage. |
tests/unit/test_bruteforce.cpp |
Switches alignment sanity test to the new storage-specific accessor. |
tests/unit/test_allocator.cpp |
Updates memory-accounting expectations to use storage alignment instead of the removed generic accessor. |
src/VecSim/vec_sim_index.h |
Replaces the old alignment accessor with query/storage variants and uses storage alignment for vector block allocation. |
src/VecSim/spaces/spaces.h |
Documents asymmetric GetDistFunc alignment semantics and adds combineAlignments. |
src/VecSim/spaces/computer/preprocessors.h |
Changes the preprocessor interface to accept separate query/storage alignments and updates cosine/quantization implementations. |
src/VecSim/spaces/computer/preprocessor_container.h |
Stores separate query/storage alignment fields and propagates them through multi-preprocessor execution. |
src/VecSim/spaces/computer/preprocessor_container.cpp |
Makes the default aligned-copy helper use query alignment specifically. |
src/VecSim/spaces/L2_space.cpp |
Publishes alignment hints for SQ8 L2 dispatchers on x86. |
src/VecSim/spaces/IP_space.cpp |
Publishes alignment hints for SQ8 IP/Cosine dispatchers on x86. |
src/VecSim/index_factories/components/preprocessors_factory.h |
Extends preprocessor factory params/overloads to carry separate query and storage alignment hints. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Alignment hints below refer to the SQ8 (first) operand per the GetDistFunc contract. | ||
| #ifdef OPT_AVX512_F_BW_VL_VNNI | ||
| if (features.avx512f && features.avx512bw && features.avx512vnni) { | ||
| if (dim % 16 == 0) // SQ8 chunk = 16 bytes | ||
| *alignment = 16 * sizeof(uint8_t); | ||
| return Choose_SQ8_FP32_IP_implementation_AVX512F_BW_VL_VNNI(dim); |
There was a problem hiding this comment.
This matches the long-standing convention used by every existing dispatcher in IP_space.cpp / L2_space.cpp (FP32, FP64, BF16, FP16, INT8, UINT8). E.g. IP_FP32_GetDistFunc writes *alignment only inside the optimized branches and leaves it untouched on the scalar fallback. The contract — honored by the single production caller CreateIndexComponents — is that the caller initializes *alignment = 0 before invoking GetDistFunc. Flipping that to "fallback unconditionally writes 0" would have to be done across every *_GetDistFunc for consistency, which is out of scope for MOD-13837.
| // Alignment hints below refer to the SQ8 (first) operand per the GetDistFunc contract. | ||
| #ifdef OPT_AVX512_F_BW_VL_VNNI | ||
| if (features.avx512f && features.avx512bw && features.avx512vnni) { | ||
| if (dim % 16 == 0) // SQ8 chunk = 16 bytes | ||
| *alignment = 16 * sizeof(uint8_t); | ||
| return Choose_SQ8_FP32_Cosine_implementation_AVX512F_BW_VL_VNNI(dim); |
There was a problem hiding this comment.
Same convention as the rest of the dispatcher layer (see e.g. IP_FP32_GetDistFunc in this file): *alignment is written only on optimized branches; the scalar fallback leaves it untouched. The caller (CreateIndexComponents) zero-initializes the output before calling. Diverging here would be inconsistent with FP32 / FP64 / BF16 / FP16 / INT8 / UINT8; flipping the contract globally is a separate refactor.
| // AVX512 VNNI SQ8_SQ8 uses 64-element chunks; residual handling is in 32-byte sub-chunks. | ||
| if (dim >= 64 && features.avx512f && features.avx512bw && features.avx512vnni) { | ||
| if (dim % 32 == 0) // align to 256 bits when there is no offsetting residual | ||
| *alignment = 32 * sizeof(uint8_t); | ||
| return Choose_SQ8_SQ8_Cosine_implementation_AVX512F_BW_VL_VNNI(dim); |
There was a problem hiding this comment.
Same convention as the rest of the dispatcher layer (see e.g. IP_FP32_GetDistFunc in this file): *alignment is written only on optimized branches; the scalar fallback leaves it untouched. The caller (CreateIndexComponents) zero-initializes the output before calling. Diverging here would be inconsistent with FP32 / FP64 / BF16 / FP16 / INT8 / UINT8; flipping the contract globally is a separate refactor.
| // Alignment hints below refer to the SQ8 (first) operand per the GetDistFunc contract. | ||
| #ifdef OPT_AVX512_F_BW_VL_VNNI | ||
| if (features.avx512f && features.avx512bw && features.avx512vnni) { | ||
| if (dim % 16 == 0) // SQ8 chunk = 16 bytes; no point in aligning if there's a residual | ||
| *alignment = 16 * sizeof(uint8_t); | ||
| return Choose_SQ8_FP32_L2_implementation_AVX512F_BW_VL_VNNI(dim); |
There was a problem hiding this comment.
Same convention as the rest of the dispatcher layer (see e.g. L2_FP32_GetDistFunc in this file): *alignment is written only on optimized branches; the scalar fallback leaves it untouched. The caller (CreateIndexComponents) zero-initializes the output before calling. Diverging here would be inconsistent with FP32 / FP64 / BF16 / FP16 / INT8 / UINT8; flipping the contract globally is a separate refactor.
| // No-optimization path must leave the hint at 0. | ||
| unsigned char alignment = 0; | ||
| (void)get(dim, &alignment, &opt); | ||
| ASSERT_EQ(alignment, 0u) << kind << ": no-optimization hint should be 0"; |
There was a problem hiding this comment.
The test asserts the documented contract: caller initializes *alignment = 0, fallback leaves it untouched. Seeding a non-zero sentinel would test a stricter contract that no *_GetDistFunc in this codebase implements — FP32, FP64, BF16, FP16, INT8, and UINT8 all leave the output untouched on the scalar fallback. Aligning the SQ8 tests to that same convention is intentional.
| static inline unsigned char combineAlignments(unsigned char a, unsigned char b) { | ||
| assert((a == 0 || (a & (a - 1)) == 0) && "alignment must be a power of two or zero"); | ||
| assert((b == 0 || (b & (b - 1)) == 0) && "alignment must be a power of two or zero"); | ||
| return a > b ? a : b; |
| // AVX512 VNNI SQ8_SQ8 uses 64-element chunks; residual handling is in 32-byte sub-chunks. | ||
| if (dim >= 64 && features.avx512f && features.avx512bw && features.avx512vnni) { | ||
| if (dim % 32 == 0) // align to 256 bits when there is no offsetting residual | ||
| *alignment = 32 * sizeof(uint8_t); | ||
| return Choose_SQ8_SQ8_IP_implementation_AVX512F_BW_VL_VNNI(dim); |
There was a problem hiding this comment.
Same convention as the rest of the dispatcher layer (see e.g. IP_FP32_GetDistFunc in this file): *alignment is written only on optimized branches; the scalar fallback leaves it untouched. The caller (CreateIndexComponents) zero-initializes the output before calling. Diverging here would be inconsistent with FP32 / FP64 / BF16 / FP16 / INT8 / UINT8; flipping the contract globally is a separate refactor.
| // AVX512 VNNI SQ8_SQ8 uses 64-element chunks; residual handling is in 32-byte sub-chunks. | ||
| if (dim >= 64 && features.avx512f && features.avx512bw && features.avx512vnni) { | ||
| if (dim % 32 == 0) // align to 256 bits when there is no offsetting residual | ||
| *alignment = 32 * sizeof(uint8_t); | ||
| return Choose_SQ8_SQ8_L2_implementation_AVX512F_BW_VL_VNNI(dim); |
There was a problem hiding this comment.
Same convention as the rest of the dispatcher layer (see e.g. L2_FP32_GetDistFunc in this file): *alignment is written only on optimized branches; the scalar fallback leaves it untouched. The caller (CreateIndexComponents) zero-initializes the output before calling. Diverging here would be inconsistent with FP32 / FP64 / BF16 / FP16 / INT8 / UINT8; flipping the contract globally is a separate refactor.
| // No-optimization path must leave the hint at 0. | ||
| unsigned char alignment = 0; | ||
| (void)get(dim, &alignment, &opt); | ||
| ASSERT_EQ(alignment, 0u) << kind << ": no-optimization hint should be 0"; |
There was a problem hiding this comment.
Same response as the SQ8_FP32 test above: the assertion matches the documented contract (caller initializes *alignment = 0, fallback leaves it untouched), which is what every other *_GetDistFunc in the codebase implements. A non-zero sentinel would test a stricter contract that doesn't exist anywhere in the dispatcher layer.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #946 +/- ##
==========================================
+ Coverage 96.95% 96.97% +0.01%
==========================================
Files 130 130
Lines 7749 7793 +44
==========================================
+ Hits 7513 7557 +44
Misses 236 236 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
lerman25
left a comment
There was a problem hiding this comment.
Nice job, mainly nitpicks.
Also - does this address the comment here?
#942 (comment)
|
@lerman25 No it doesn't fix #942 (comment), it has a diferent ticket. |
- test_spaces.cpp: drop MOD-13837 prefix from test header comment - preprocessors.h: inline the rationale for unsupported dynamic resizing instead of pointing at commit history - preprocessor_container.h: rename 'Legacy ctor' to 'Homogeneous ctor' on both PreprocessorsContainerAbstract and MultiPreprocessorsContainer (the homogeneous overload is not deprecated; every in-tree caller uses it) - vec_sim_index.h: replace the vague 'per-element stride padding follow-up' note with a concrete explanation of why only the first vector per block is guaranteed aligned today
Address Cursor Bugbot review: the storage_alignment ? allocate_aligned : allocate ternary was the last surviving instance of this pattern in the file - QuantPreprocessor::preprocessQuery and CosinePreprocessor both already call allocate_aligned unconditionally. allocate_aligned(size, 0) short-circuits to allocate(size) at the top of VecSimAllocator::allocate_aligned (vecsim_malloc.cpp:47-49), so the behavior is unchanged.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 509adcb. Configure here.
PR #944 added QuantPreprocessorFP16MetricTest using the old single-alignment signatures. After merging origin/main, update the two new call sites to pass storage and query alignments explicitly: - preprocess(...): pass alignment as both storage_alignment and query_alignment (alignment is 0 in this test, so behavior is unchanged) - preprocessForStorage(...): pass storage_alignment=0
| assert(VecSimType_sizeof(vecType)); | ||
| assert(storedDataSize); | ||
| assert(inputBlobSize); | ||
| // DataBlocksContainer holds the persistent storage vectors, so it must honor the storage |

Describe the changes in the pull request
Split the single
alignmenthint carried by the preprocessor stack into two independent values,query_alignmentandstorage_alignment, so that storage and query buffers can satisfy different SIMD alignment requirements. This is the prerequisite plumbing for asymmetric kernels such as SQ8↔FP32, where the storage operand (e.g. SQ8 → 16 B) and the query operand (e.g. FP32 → 32 B) have different preferences.Key changes:
PreprocessorInterface::preprocessnow takes bothstorage_alignmentandquery_alignment. The legacy single-alignment overload at the lowest layer is removed; the same-blob path is decided by the concrete preprocessor (CosinePreprocessorshares,QuantPreprocessordoes not).PreprocessorsContainerAbstract/MultiPreprocessorsContainerstore both fields and exposegetQueryAlignment()/getStorageAlignment(). The deprecatedgetAlignment()alias has been removed; all callers now explicitly pick one.CreatePreprocessorsContainerandCreatePreprocessorsContainerParamsget an asymmetric overload that takes both alignments; the existing single-alignment overload mirrors the value into both fields, so all current production callers are unchanged behaviorally.CosinePreprocessorallocates its shared buffer withspaces::combineAlignments(storage_alignment, query_alignment)(the strictest power-of-two that satisfies both consumers).QuantPreprocessor::preprocessForStoragehonorsstorage_alignmentwhen allocating the quantized blob.GetDistFuncis documented inspaces.h: the returned hint refers to the first operand only; callers that need both must query both dispatchers and combine.IP_space.cpp,L2_space.cpp) now publish per-feature alignment hints (AVX512 → 16 / 32, AVX2_FMA → 8, AVX2 → 8, SSE4 → 4, no-opt → 0) for both the SQ8↔SQ8 and SQ8↔FP32 paths.VecSimAllocator::allocate_aligned(size, 0)is documented/exercised as equivalent toallocate(size), allowing the preprocessor allocation sites to drop thealignment ? aligned : plainternary helper and call the aligned variant unconditionally.No in-tree production caller currently produces asymmetric alignments:
CreateIndexComponentsis templated on a singleDataType, queries one dispatcher, and forwards the same hint to both fields. The asymmetric path is reachable today only by callers that build the components manually (or by the disk-backed quantized index in the follow-upvecsim_diskrepo, wherequery_alignment = fullAlignment; storage_alignment = combineAlignments(quantizedAlignment, mixedAlignment)).Test changes:
PreprocessorsTest.QuantizationAsymmetricAlignment(new) exercisesquery_alignment=32,storage_alignment=16throughpreprocess,preprocessForStorage, andpreprocessQuery, asserting independent pointer alignment for both blobs.SpacesTest.SQ8_FP32_DispatcherAlignmentHintsandSpacesTest.SQ8_SQ8_DispatcherAlignmentHints(new) directly assert the alignment hint published by the IP / L2 / Cosine dispatchers per CPU feature, including the no-optimization 0-case.alignment = 5were bumped toalignment = 8so they satisfy the newcombineAlignmentspower-of-two invariant.test_allocator,test_bruteforce, andtest_hnsw_tieredwere migrated fromgetAlignment()togetStorageAlignment()(every site was already storage-semantic).ctest -j 4sweep: 2150 / 2150 passing.Which issues this PR fixes
Main objects this PR modified
PreprocessorInterface/CosinePreprocessor/QuantPreprocessor(src/VecSim/spaces/computer/preprocessors.h)PreprocessorsContainerAbstract/MultiPreprocessorsContainer(src/VecSim/spaces/computer/preprocessor_container.{h,cpp})CreatePreprocessorsContainer/CreatePreprocessorsContainerParams(src/VecSim/index_factories/components/preprocessors_factory.h)VecSimIndexAbstractaccessors (src/VecSim/vec_sim_index.h)src/VecSim/spaces/IP_space.cppandsrc/VecSim/spaces/L2_space.cppspaces::combineAlignmentsand the asymmetric-types contract documentation (src/VecSim/spaces/spaces.h)Mark if applicable
Note
Medium Risk
Medium risk because it changes core preprocessing and alignment plumbing (allocation and SIMD dispatch hints) across multiple metrics and index storage paths, which could affect correctness/performance if any caller assumes a single alignment contract.
Overview
Separates alignment handling for query vs storage buffers by replacing the single preprocessor
alignmenthint with independentquery_alignmentandstorage_alignmentvalues throughout the preprocessor interfaces, containers, and factory helpers (with backward-compatible single-alignment overloads).Updates preprocessing implementations (notably
CosinePreprocessorandQuantPreprocessor) to allocate buffers using the appropriate alignment (orcombineAlignmentswhen sharing a buffer), and updatesVecSimIndexAbstract/storage block allocation to explicitly use storage alignment.Adds/extends SIMD dispatcher alignment hints for SQ8 kernels (IP/L2/Cosine) and strengthens tests to validate the new alignment contract, including new unit tests for asymmetric alignment and dispatcher hint values.
Reviewed by Cursor Bugbot for commit c6443af. Bugbot is set up for automated code reviews on this repo. Configure here.