Separate query and storage alignment requirements [MOD-13837] by dor-forer · Pull Request #946 · RedisAI/VectorSimilarity

dor-forer · 2026-05-03T13:28:15Z

Describe the changes in the pull request

Split the single alignment hint carried by the preprocessor stack into two independent values, query_alignment and storage_alignment, so that storage and query buffers can satisfy different SIMD alignment requirements. This is the prerequisite plumbing for asymmetric kernels such as SQ8↔FP32, where the storage operand (e.g. SQ8 → 16 B) and the query operand (e.g. FP32 → 32 B) have different preferences.

Key changes:

PreprocessorInterface::preprocess now takes both storage_alignment and query_alignment. The legacy single-alignment overload at the lowest layer is removed; the same-blob path is decided by the concrete preprocessor (CosinePreprocessor shares, QuantPreprocessor does not).
PreprocessorsContainerAbstract / MultiPreprocessorsContainer store both fields and expose getQueryAlignment() / getStorageAlignment(). The deprecated getAlignment() alias has been removed; all callers now explicitly pick one.
CreatePreprocessorsContainer and CreatePreprocessorsContainerParams get an asymmetric overload that takes both alignments; the existing single-alignment overload mirrors the value into both fields, so all current production callers are unchanged behaviorally.
CosinePreprocessor allocates its shared buffer with spaces::combineAlignments(storage_alignment, query_alignment) (the strictest power-of-two that satisfies both consumers).
QuantPreprocessor::preprocessForStorage honors storage_alignment when allocating the quantized blob.
The asymmetric-types contract for GetDistFunc is documented in spaces.h: the returned hint refers to the first operand only; callers that need both must query both dispatchers and combine.
The x86 SQ8 dispatchers (IP_space.cpp, L2_space.cpp) now publish per-feature alignment hints (AVX512 → 16 / 32, AVX2_FMA → 8, AVX2 → 8, SSE4 → 4, no-opt → 0) for both the SQ8↔SQ8 and SQ8↔FP32 paths.
VecSimAllocator::allocate_aligned(size, 0) is documented/exercised as equivalent to allocate(size), allowing the preprocessor allocation sites to drop the alignment ? aligned : plain ternary helper and call the aligned variant unconditionally.

No in-tree production caller currently produces asymmetric alignments: CreateIndexComponents is templated on a single DataType, queries one dispatcher, and forwards the same hint to both fields. The asymmetric path is reachable today only by callers that build the components manually (or by the disk-backed quantized index in the follow-up vecsim_disk repo, where query_alignment = fullAlignment; storage_alignment = combineAlignments(quantizedAlignment, mixedAlignment)).

Test changes:

PreprocessorsTest.QuantizationAsymmetricAlignment (new) exercises query_alignment=32, storage_alignment=16 through preprocess, preprocessForStorage, and preprocessQuery, asserting independent pointer alignment for both blobs.
SpacesTest.SQ8_FP32_DispatcherAlignmentHints and SpacesTest.SQ8_SQ8_DispatcherAlignmentHints (new) directly assert the alignment hint published by the IP / L2 / Cosine dispatchers per CPU feature, including the no-optimization 0-case.
Existing tests that previously used alignment = 5 were bumped to alignment = 8 so they satisfy the new combineAlignments power-of-two invariant. test_allocator, test_bruteforce, and test_hnsw_tiered were migrated from getAlignment() to getStorageAlignment() (every site was already storage-semantic).
Full ctest -j 4 sweep: 2150 / 2150 passing.

Which issues this PR fixes

MOD-13837

Main objects this PR modified

PreprocessorInterface / CosinePreprocessor / QuantPreprocessor (src/VecSim/spaces/computer/preprocessors.h)
PreprocessorsContainerAbstract / MultiPreprocessorsContainer (src/VecSim/spaces/computer/preprocessor_container.{h,cpp})
CreatePreprocessorsContainer / CreatePreprocessorsContainerParams (src/VecSim/index_factories/components/preprocessors_factory.h)
VecSimIndexAbstract accessors (src/VecSim/vec_sim_index.h)
SQ8 dispatchers in src/VecSim/spaces/IP_space.cpp and src/VecSim/spaces/L2_space.cpp
spaces::combineAlignments and the asymmetric-types contract documentation (src/VecSim/spaces/spaces.h)

Mark if applicable

This PR introduces API changes
This PR introduces serialization changes

Note

Medium Risk
Medium risk because it changes core preprocessing and alignment plumbing (allocation and SIMD dispatch hints) across multiple metrics and index storage paths, which could affect correctness/performance if any caller assumes a single alignment contract.

Overview
Separates alignment handling for query vs storage buffers by replacing the single preprocessor alignment hint with independent query_alignment and storage_alignment values throughout the preprocessor interfaces, containers, and factory helpers (with backward-compatible single-alignment overloads).

Updates preprocessing implementations (notably CosinePreprocessor and QuantPreprocessor) to allocate buffers using the appropriate alignment (or combineAlignments when sharing a buffer), and updates VecSimIndexAbstract/storage block allocation to explicitly use storage alignment.

Adds/extends SIMD dispatcher alignment hints for SQ8 kernels (IP/L2/Cosine) and strengthens tests to validate the new alignment contract, including new unit tests for asymmetric alignment and dispatcher hint values.

^{Reviewed by Cursor Bugbot for commit c6443af. Bugbot is set up for automated code reviews on this repo. Configure here.}

Split the single alignment hint carried by PreprocessorsContainerAbstract into query_alignment and storage_alignment so different SIMD kernels (e.g. SQ8 storage vs FP32 query) can declare different alignment needs. - spaces/spaces.h: add combineAlignments() helper and document the GetDistFunc asymmetric-types contract (alignment hint refers to the first / storage operand only). - spaces/IP_space.cpp, spaces/L2_space.cpp: SQ8<->FP32 and SQ8<->SQ8 dispatchers now publish per-kernel alignment hints. - spaces/computer/preprocessors.h: PreprocessorInterface preprocess() and preprocessForStorage() carry storage_alignment / query_alignment. CosinePreprocessor uses combineAlignments() for the shared-buffer path. Removed the same-size preprocess() overload from the interface. - spaces/computer/preprocessor_container.{h,cpp}: split alignment field into query_alignment and storage_alignment; legacy single-arg constructors retained as wrappers. Added getQueryAlignment() / getStorageAlignment(); getAlignment() kept as a deprecated alias returning storage_alignment. - index_factories/components/preprocessors_factory.h: extend PreprocessorsContainerParams and CreatePreprocessorsContainer overloads to accept both alignments; single-alignment overloads kept for callers that don't need the distinction. - vec_sim_index.h: DataBlocksContainer now built with getStorageAlignment(); added getQueryAlignment() / getStorageAlignment() accessors. - tests/unit/test_components.cpp, tests/unit/test_hnsw_tiered.cpp: updated mock preprocessors and fixtures to the new interface. - tests/unit/test_spaces.cpp: SelfDistanceCosine / SelfDistanceL2 for SQ8<->FP32 now disable AVX2_FMA / AVX2 / SSE4 before the no-optimization assertion, mirroring the rest of the spaces tests.

…ests QuantPreprocessor::preprocessForStorage() now allocates the quantized storage blob with allocate_aligned() when storage_alignment != 0, so SQ8 storage actually honors the per-kernel alignment hint plumbed through PreprocessorsContainerAbstract. New tests: - PreprocessorsTest.QuantizationAsymmetricAlignment: drives the quant preprocessor with query_alignment=32 and storage_alignment=16 and asserts that preprocess(), preprocessForStorage(), and preprocessQuery() each return a pointer aligned to its respective hint. - SpacesTest.SQ8_FP32_DispatcherAlignmentHints / SQ8_SQ8_DispatcherAlignmentHints: assert the exact alignment-hint values published by the x86 SQ8 dispatchers (AVX512 -> 16/32, AVX2_FMA -> 8, AVX2 -> 8, SSE4 -> 4), plus the no-optimization 0-case, for IP / L2 / Cosine.

VecSimAllocator::allocate_aligned(size, 0) already falls through to allocate(size), so the helper's alignment ternary was redundant. Inline the four call sites to a direct allocate_aligned() call.

Remove the temporary getAlignment() forwarders from PreprocessorsContainerAbstract and VecSimIndexAbstract. All callers in test_bruteforce, test_hnsw_tiered, and test_allocator now use the explicit getStorageAlignment() accessor, which is what every site already meant (DataBlock allocation overhead, brute-force storage, homogeneous-alignment fixture).

jit-ci · 2026-05-03T13:29:51Z

🛡️ Jit Security Scan Results

✅ No security findings were detected in this PR

^{Security scan by Jit}

Copilot

Pull request overview

This PR threads separate query/storage alignment hints through the preprocessor pipeline so asymmetric kernels can request different SIMD alignments for stored vectors and queries. It updates the core preprocessor/container interfaces, publishes new SQ8 dispatcher hints, and adjusts unit tests to cover the new alignment behavior.

Changes:

Split the old single alignment hint into query_alignment and storage_alignment across preprocessors, containers, factories, and index accessors.
Updated cosine/quantization preprocessing paths and x86 SQ8 dispatchers to use the new alignment semantics.
Refreshed unit tests to validate asymmetric quantization alignment and dispatcher hint values.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
`tests/unit/test_spaces.cpp`	Adds SQ8 self-distance coverage for more x86 feature branches and new dispatcher-alignment hint tests.
`tests/unit/test_hnsw_tiered.cpp`	Updates test preprocessor overrides to the new alignment-aware interface and accessor.
`tests/unit/test_components.cpp`	Refactors dummy preprocessors/tests for separate query vs. storage alignment and adds asymmetric quantization coverage.
`tests/unit/test_bruteforce.cpp`	Switches alignment sanity test to the new storage-specific accessor.
`tests/unit/test_allocator.cpp`	Updates memory-accounting expectations to use storage alignment instead of the removed generic accessor.
`src/VecSim/vec_sim_index.h`	Replaces the old alignment accessor with query/storage variants and uses storage alignment for vector block allocation.
`src/VecSim/spaces/spaces.h`	Documents asymmetric `GetDistFunc` alignment semantics and adds `combineAlignments`.
`src/VecSim/spaces/computer/preprocessors.h`	Changes the preprocessor interface to accept separate query/storage alignments and updates cosine/quantization implementations.
`src/VecSim/spaces/computer/preprocessor_container.h`	Stores separate query/storage alignment fields and propagates them through multi-preprocessor execution.
`src/VecSim/spaces/computer/preprocessor_container.cpp`	Makes the default aligned-copy helper use query alignment specifically.
`src/VecSim/spaces/L2_space.cpp`	Publishes alignment hints for SQ8 L2 dispatchers on x86.
`src/VecSim/spaces/IP_space.cpp`	Publishes alignment hints for SQ8 IP/Cosine dispatchers on x86.
`src/VecSim/index_factories/components/preprocessors_factory.h`	Extends preprocessor factory params/overloads to carry separate query and storage alignment hints.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

dor-forer · 2026-05-03T13:42:41Z

+    // Alignment hints below refer to the SQ8 (first) operand per the GetDistFunc contract.
 #ifdef OPT_AVX512_F_BW_VL_VNNI
    if (features.avx512f && features.avx512bw && features.avx512vnni) {
+        if (dim % 16 == 0) // SQ8 chunk = 16 bytes
+            *alignment = 16 * sizeof(uint8_t);
        return Choose_SQ8_FP32_IP_implementation_AVX512F_BW_VL_VNNI(dim);


This matches the long-standing convention used by every existing dispatcher in IP_space.cpp / L2_space.cpp (FP32, FP64, BF16, FP16, INT8, UINT8). E.g. IP_FP32_GetDistFunc writes *alignment only inside the optimized branches and leaves it untouched on the scalar fallback. The contract — honored by the single production caller CreateIndexComponents — is that the caller initializes *alignment = 0 before invoking GetDistFunc. Flipping that to "fallback unconditionally writes 0" would have to be done across every *_GetDistFunc for consistency, which is out of scope for MOD-13837.

dor-forer · 2026-05-03T13:42:43Z

+    // Alignment hints below refer to the SQ8 (first) operand per the GetDistFunc contract.
 #ifdef OPT_AVX512_F_BW_VL_VNNI
    if (features.avx512f && features.avx512bw && features.avx512vnni) {
+        if (dim % 16 == 0) // SQ8 chunk = 16 bytes
+            *alignment = 16 * sizeof(uint8_t);
        return Choose_SQ8_FP32_Cosine_implementation_AVX512F_BW_VL_VNNI(dim);


Same convention as the rest of the dispatcher layer (see e.g. IP_FP32_GetDistFunc in this file): *alignment is written only on optimized branches; the scalar fallback leaves it untouched. The caller (CreateIndexComponents) zero-initializes the output before calling. Diverging here would be inconsistent with FP32 / FP64 / BF16 / FP16 / INT8 / UINT8; flipping the contract globally is a separate refactor.

dor-forer · 2026-05-03T13:42:46Z

+    // AVX512 VNNI SQ8_SQ8 uses 64-element chunks; residual handling is in 32-byte sub-chunks.
    if (dim >= 64 && features.avx512f && features.avx512bw && features.avx512vnni) {
+        if (dim % 32 == 0) // align to 256 bits when there is no offsetting residual
+            *alignment = 32 * sizeof(uint8_t);
        return Choose_SQ8_SQ8_Cosine_implementation_AVX512F_BW_VL_VNNI(dim);


Same convention as the rest of the dispatcher layer (see e.g. IP_FP32_GetDistFunc in this file): *alignment is written only on optimized branches; the scalar fallback leaves it untouched. The caller (CreateIndexComponents) zero-initializes the output before calling. Diverging here would be inconsistent with FP32 / FP64 / BF16 / FP16 / INT8 / UINT8; flipping the contract globally is a separate refactor.

dor-forer · 2026-05-03T13:42:47Z

+    // Alignment hints below refer to the SQ8 (first) operand per the GetDistFunc contract.
 #ifdef OPT_AVX512_F_BW_VL_VNNI
    if (features.avx512f && features.avx512bw && features.avx512vnni) {
+        if (dim % 16 == 0) // SQ8 chunk = 16 bytes; no point in aligning if there's a residual
+            *alignment = 16 * sizeof(uint8_t);
        return Choose_SQ8_FP32_L2_implementation_AVX512F_BW_VL_VNNI(dim);


Same convention as the rest of the dispatcher layer (see e.g. L2_FP32_GetDistFunc in this file): *alignment is written only on optimized branches; the scalar fallback leaves it untouched. The caller (CreateIndexComponents) zero-initializes the output before calling. Diverging here would be inconsistent with FP32 / FP64 / BF16 / FP16 / INT8 / UINT8; flipping the contract globally is a separate refactor.

dor-forer · 2026-05-03T13:42:54Z

+        // No-optimization path must leave the hint at 0.
+        unsigned char alignment = 0;
+        (void)get(dim, &alignment, &opt);
+        ASSERT_EQ(alignment, 0u) << kind << ": no-optimization hint should be 0";


The test asserts the documented contract: caller initializes *alignment = 0, fallback leaves it untouched. Seeding a non-zero sentinel would test a stricter contract that no *_GetDistFunc in this codebase implements — FP32, FP64, BF16, FP16, INT8, and UINT8 all leave the output untouched on the scalar fallback. Aligning the SQ8 tests to that same convention is intentional.

+static inline unsigned char combineAlignments(unsigned char a, unsigned char b) {
+    assert((a == 0 || (a & (a - 1)) == 0) && "alignment must be a power of two or zero");
+    assert((b == 0 || (b & (b - 1)) == 0) && "alignment must be a power of two or zero");
+    return a > b ? a : b;


dor-forer · 2026-05-03T13:42:49Z

+    // AVX512 VNNI SQ8_SQ8 uses 64-element chunks; residual handling is in 32-byte sub-chunks.
    if (dim >= 64 && features.avx512f && features.avx512bw && features.avx512vnni) {
+        if (dim % 32 == 0) // align to 256 bits when there is no offsetting residual
+            *alignment = 32 * sizeof(uint8_t);
        return Choose_SQ8_SQ8_IP_implementation_AVX512F_BW_VL_VNNI(dim);


Same convention as the rest of the dispatcher layer (see e.g. IP_FP32_GetDistFunc in this file): *alignment is written only on optimized branches; the scalar fallback leaves it untouched. The caller (CreateIndexComponents) zero-initializes the output before calling. Diverging here would be inconsistent with FP32 / FP64 / BF16 / FP16 / INT8 / UINT8; flipping the contract globally is a separate refactor.

dor-forer · 2026-05-03T13:42:52Z

+    // AVX512 VNNI SQ8_SQ8 uses 64-element chunks; residual handling is in 32-byte sub-chunks.
    if (dim >= 64 && features.avx512f && features.avx512bw && features.avx512vnni) {
+        if (dim % 32 == 0) // align to 256 bits when there is no offsetting residual
+            *alignment = 32 * sizeof(uint8_t);
        return Choose_SQ8_SQ8_L2_implementation_AVX512F_BW_VL_VNNI(dim);


Same convention as the rest of the dispatcher layer (see e.g. L2_FP32_GetDistFunc in this file): *alignment is written only on optimized branches; the scalar fallback leaves it untouched. The caller (CreateIndexComponents) zero-initializes the output before calling. Diverging here would be inconsistent with FP32 / FP64 / BF16 / FP16 / INT8 / UINT8; flipping the contract globally is a separate refactor.

dor-forer · 2026-05-03T13:42:56Z

+        // No-optimization path must leave the hint at 0.
+        unsigned char alignment = 0;
+        (void)get(dim, &alignment, &opt);
+        ASSERT_EQ(alignment, 0u) << kind << ": no-optimization hint should be 0";


Same response as the SQ8_FP32 test above: the assertion matches the documented contract (caller initializes *alignment = 0, fallback leaves it untouched), which is what every other *_GetDistFunc in the codebase implements. A non-zero sentinel would test a stricter contract that doesn't exist anywhere in the dispatcher layer.

codecov · 2026-05-04T09:33:47Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.97%. Comparing base (d2563bf) to head (c6443af).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #946      +/-   ##
==========================================
+ Coverage   96.95%   96.97%   +0.01%     
==========================================
  Files         130      130              
  Lines        7749     7793      +44     
==========================================
+ Hits         7513     7557      +44     
  Misses        236      236

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lerman25

Nice job, mainly nitpicks.
Also - does this address the comment here?
#942 (comment)

dor-forer · 2026-05-10T12:58:57Z

@lerman25 No it doesn't fix #942 (comment), it has a diferent ticket.
The fix there is to use std::memcpy when you build the metadata in the quantized vector.

- test_spaces.cpp: drop MOD-13837 prefix from test header comment - preprocessors.h: inline the rationale for unsupported dynamic resizing instead of pointing at commit history - preprocessor_container.h: rename 'Legacy ctor' to 'Homogeneous ctor' on both PreprocessorsContainerAbstract and MultiPreprocessorsContainer (the homogeneous overload is not deprecated; every in-tree caller uses it) - vec_sim_index.h: replace the vague 'per-element stride padding follow-up' note with a concrete explanation of why only the first vector per block is guaranteed aligned today

Address Cursor Bugbot review: the storage_alignment ? allocate_aligned : allocate ternary was the last surviving instance of this pattern in the file - QuantPreprocessor::preprocessQuery and CosinePreprocessor both already call allocate_aligned unconditionally. allocate_aligned(size, 0) short-circuits to allocate(size) at the top of VecSimAllocator::allocate_aligned (vecsim_malloc.cpp:47-49), so the behavior is unchanged.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 509adcb. Configure here.}

PR #944 added QuantPreprocessorFP16MetricTest using the old single-alignment signatures. After merging origin/main, update the two new call sites to pass storage and query alignments explicitly: - preprocess(...): pass alignment as both storage_alignment and query_alignment (alignment is 0 in this test, so behavior is unchanged) - preprocessForStorage(...): pass storage_alignment=0

lerman25 · 2026-05-11T14:58:00Z

        assert(VecSimType_sizeof(vecType));
        assert(storedDataSize);
        assert(inputBlobSize);
+        // DataBlocksContainer holds the persistent storage vectors, so it must honor the storage


Consider ticket

dor-forer added 5 commits May 3, 2026 09:53

MOD-13837: drop CosinePreprocessor::allocateBlob helper

7d45514

VecSimAllocator::allocate_aligned(size, 0) already falls through to allocate(size), so the helper's alignment ternary was redundant. Inline the four call sites to a direct allocate_aligned() call.

Refactor alignment parameters in preprocessors for consistency

b9f8af9

dor-forer changed the title ~~Dorer split alignment~~ Separate query and storage alignment requirements [MOD-13837] May 3, 2026

dor-forer requested review from Copilot and meiravgri May 3, 2026 13:30

Copilot started reviewing on behalf of dor-forer May 3, 2026 13:31 View session

Copilot AI reviewed May 3, 2026

View reviewed changes

dor-forer requested review from lerman25 and removed request for meiravgri May 4, 2026 09:09

dor-forer marked this pull request as ready for review May 4, 2026 09:10

cursor Bot reviewed May 4, 2026

View reviewed changes

Comment thread src/VecSim/spaces/computer/preprocessors.h Outdated

lerman25 reviewed May 7, 2026

View reviewed changes

dor-forer enabled auto-merge May 10, 2026 13:06

cursor Bot reviewed May 10, 2026

View reviewed changes

Comment thread src/VecSim/spaces/computer/preprocessors.h

dor-forer added 2 commits May 10, 2026 16:24

Merge remote-tracking branch 'origin/main' into dorer-split-alignment

ca816e3

dor-forer requested a review from lerman25 May 10, 2026 13:50

lerman25 reviewed May 11, 2026

View reviewed changes

lerman25 approved these changes May 11, 2026

View reviewed changes

dor-forer added this pull request to the merge queue May 11, 2026

Merged via the queue into main with commit bbe9dfd May 11, 2026
17 checks passed

dor-forer deleted the dorer-split-alignment branch May 11, 2026 15:41

Conversation

dor-forer commented May 3, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jit-ci Bot commented May 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🛡️ Jit Security Scan Results

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lerman25 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dor-forer commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dor-forer commented May 3, 2026 •

edited by cursor Bot

Loading

jit-ci Bot commented May 3, 2026 •

edited

Loading

codecov Bot commented May 4, 2026 •

edited

Loading

dor-forer commented May 10, 2026 •

edited

Loading