Skip to content

[diskann-vector] Support truly unaligned distances.#981

Open
hildebrandmw wants to merge 4 commits intomainfrom
mhildebr/super-unaligned
Open

[diskann-vector] Support truly unaligned distances.#981
hildebrandmw wants to merge 4 commits intomainfrom
mhildebr/super-unaligned

Conversation

@hildebrandmw
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw commented Apr 28, 2026

An internal user has a case where full-precision vectors (e.g. f32) are stored in completely unaligned buffers (e.g. align of 1), requiring a data copy to align the data before the slices can be safely constructed. However, our distance function implementations use SIMDVector::load_unaligned under the hood, which are compatible with under-aligned pointers.

This PR exposes a proper API to the DistanceProvider trait (via the Distance type) for invoking the SIMD implementations with unaligned pointers.

Suggested Reviewing Order

  • diskann-wide: The implementations of SIMDVector::load* and SIMDVector::store* already support underaligned pointers. This PR updates the documentation and restructures the load/store tests to verify this property (we were already using this property in some of the quantized distance kernels). The new load/store tests successfully pass Miri.

  • unaligned.rs - a new UnalignedSlice is added for unaligned slices. This is just a pointer + length pair with some validity requirements but no alignment requirement. Conversions from &[T] and &[T; N] are added and the trait AsUnaligned replaces the use of AsRef<[T]> and the internal ToSlice traits.

    A test-only Buffer is used to purposely offset simple types to exercise the unaligned cases.

  • distance/simd.rs: The simd_op kernel is tweaked to accept AsUnaligned instead of AsRef. Checks have been added to the existing tests to ensure that the under-unaligned versions are both Miri compatible and yield the exact same results as their properly aligned counterparts.

  • distance/implementation.rs: The architecture hooks and specialization are changed to use AsUnaligned. I've investigated the code generation and the checks for impl FTarget<...> for Specialize<N, F> are sufficient to trigger constant propagation and the full unrolling of small fixed-sized kernels.

  • distance/distance_provider.rs: The Distance type is changed to pass UnalignedSlices across the function pointer boundary rather than raw slices. We can keep the existing API for slices trivially via AsUnaligned.

Code Generation

Unfortunately, the order in which functions are code-generated seems to have changed with this PR. That said, the fixed-sized specializations I have spot-checked result in identical assembly with this PR as with main, which is to be expected.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class support in diskann-vector for computing SIMD-accelerated distances over truly under-aligned vector buffers (e.g., alignment 1), avoiding the need to copy data just to form &[T].

Changes:

  • Introduces UnalignedSlice + AsUnaligned and re-exports them from the crate root.
  • Updates SIMD distance kernels and specialization/dispatch plumbing to accept AsUnaligned inputs.
  • Extends Distance with call_unaligned and adds tests that exercise intentionally misaligned buffers.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
diskann-vector/src/unaligned.rs Adds UnalignedSlice, AsUnaligned, and a test-only Buffer to create intentionally misaligned data.
diskann-vector/src/lib.rs Exposes the new unaligned APIs from the crate root.
diskann-vector/src/test_util.rs Refactors test harness to accept a &mut dyn DistanceChecker (trait object).
diskann-vector/src/distance/simd.rs Changes simd_op to accept AsUnaligned and adds tests validating unaligned correctness/Miri safety.
diskann-vector/src/distance/implementations.rs Updates architecture hooks and fixed-size specialization to operate on AsUnaligned / UnalignedSlice.
diskann-vector/src/distance/distance_provider.rs Switches dispatched function signature to UnalignedSlice and adds Distance::call_unaligned.
diskann-vector/Cargo.toml Adds bytemuck (dev) and enables half/bytemuck for tests.
diskann-providers/src/model/pq/distance/multi.rs Adjusts reference distance calls to pass slices via explicit deref (&*...).
Cargo.lock Records the new bytemuck dependency resolution.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-vector/src/distance/implementations.rs Outdated
Comment thread diskann-vector/src/distance/simd.rs
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 28, 2026

Codecov Report

❌ Patch coverage is 95.16908% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.63%. Comparing base (f458cf6) to head (c3c2f66).

Files with missing lines Patch % Lines
diskann-vector/src/unaligned.rs 90.90% 6 Missing ⚠️
diskann-vector/src/distance/distance_provider.rs 66.66% 4 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #981      +/-   ##
==========================================
+ Coverage   89.48%   90.63%   +1.14%     
==========================================
  Files         448      449       +1     
  Lines       84081    84206     +125     
==========================================
+ Hits        75239    76318    +1079     
+ Misses       8842     7888     -954     
Flag Coverage Δ
miri 90.63% <95.16%> (+1.14%) ⬆️
unittests 90.59% <95.16%> (+1.26%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-providers/src/model/pq/distance/multi.rs 96.11% <100.00%> (ø)
diskann-vector/src/distance/implementations.rs 96.81% <100.00%> (+0.87%) ⬆️
diskann-vector/src/distance/simd.rs 90.14% <100.00%> (+12.92%) ⬆️
diskann-vector/src/lib.rs 44.44% <ø> (ø)
diskann-vector/src/test_util.rs 100.00% <100.00%> (ø)
diskann-vector/src/distance/distance_provider.rs 98.58% <66.66%> (-1.42%) ⬇️
diskann-vector/src/unaligned.rs 90.90% <90.90%> (ø)

... and 37 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-vector/src/distance/implementations.rs
Comment thread diskann-vector/src/unaligned.rs
Copy link
Copy Markdown
Contributor

@arkrishn94 arkrishn94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good Mark, mostly had small comments only. The only callout is the question on the indirection through AsUnaligned. Would love to understand why this is needed.

Comment thread diskann-vector/src/unaligned.rs
T: 'static,
{
type Of<'a> = UnalignedSlice<'a, T>;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably my lack of knowledge of the AddLifetime trait but for my learning- why is this implementation needed for Unaligned<T>?

Copy link
Copy Markdown
Contributor Author

@hildebrandmw hildebrandmw Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note that this is a private type to make Dispatched2 work).

The main problem this is solving is outlined here. The gist is that Dispatched2 is a function-pointer wrapper, but function types and function-pointer types in Rust are blessed with an automatic HRTB-like syntax that is unavailable to normal types. For example, this works: f: fn(&f32) -> f32 and the type fn(&f32) -> f32 is 'static where-as if you try to instantiate a generic Vec<&f32>, you need both a lifetime and the resulting type is not 'static.

So the AddLifetime is a workaround for this.

Self: PostOp<<$impl as simd::SIMDSchema<L::Target, R::Target, A>>::Return, T>,
L: AsUnaligned,
R: AsUnaligned,
$impl: simd::SIMDSchema<L::Element, R::Element, A>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that all implementations of AsUnaligned route through impl From<B> for UnalignedSlice<T, '_>. Out of curiosity - why not get rid of this trait and bound using Into<UnalignedSlice<T, '_>?

I'm using this as an opportunity to learn : )

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is one of potential trait ambiguity. The AsUnaligned as the Element as a GAT, which can then be fed unambiguously to SIMDSchema. If we used a more generic trait like From, we'd need to specify the element-type somehow, which won't work because we'd end up with uncovered generics.

You can try this yourself if you take the code and start trying to use From, it gets bad fast 😄

Comment thread diskann-vector/Cargo.toml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants