Cleaning up test work and changing the get_degree_stats signature.#998
Cleaning up test work and changing the get_degree_stats signature.#998JordanMaples wants to merge 5 commits intomainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #998 +/- ##
==========================================
- Coverage 90.63% 89.45% -1.18%
==========================================
Files 448 448
Lines 84095 83990 -105
==========================================
- Hits 76216 75130 -1086
- Misses 7879 8860 +981
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Cleans up and reorganizes test utilities while changing get_degree_stats to accept an explicit iterator of IDs (instead of requiring providers to implement IntoIterator).
Changes:
- Updated
get_degree_stats(and sync wrapper) to take anIntoIteratorof IDs to compute degree stats for. - Removed/relocated lower-signal tests and improved test assertions for
DegreeStats. - Simplified 2D-square test setup helpers and refactored neighbor access in the test provider.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann/src/graph/test/provider.rs | Adds ID iterators + refactors neighbor retrieval; removes IntoIterator impl. |
| diskann/src/graph/test/cases/inplace_delete.rs | Adapts tests to new setup_2d_square helper signature. |
| diskann/src/graph/test/cases/index.rs | Removes paged-search tests; updates degree-stats tests for new API. |
| diskann/src/graph/test/cases/helpers.rs | Simplifies helper by internalizing vector generation (removes Matrix param). |
| diskann/src/graph/test/cases/consolidate.rs | Adapts consolidation tests to new helper signature. |
| diskann/src/graph/index.rs | Changes get_degree_stats signature; moves QueryLabelProvider test here; adds verbose-eq for DegreeStats. |
| diskann-providers/src/index/wrapped_async.rs | Updates sync wrapper to pass ID iterator through to async API. |
| diskann-providers/src/index/diskann_async.rs | Updates provider-side tests to pass iterators into get_degree_stats. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let res_unsat = index_sat | ||
| .get_degree_stats(&mut accessor_unsat) | ||
| .get_degree_stats(&mut accessor_unsat, index_unsat.provider().iter()) | ||
| .await | ||
| .unwrap(); |
There was a problem hiding this comment.
res_unsat is computed by calling get_degree_stats on index_sat while using accessor_unsat + index_unsat.provider().iter(). This mixes the saturated and unsaturated indexes and can invalidate the comparison (and may even be logically inconsistent if index-level config like max degree differs). Call get_degree_stats on index_unsat for the unsaturated case.
| pub fn get_degree_stats<NA, Itr>( | ||
| &self, | ||
| accessor: &mut NA, | ||
| itr: Itr, | ||
| ) -> impl SendFuture<ANNResult<DegreeStats>> | ||
| where | ||
| for<'a> &'a DP: IntoIterator<Item = DP::InternalId, IntoIter: Send>, | ||
| Itr: IntoIterator<Item = DP::InternalId, IntoIter: Send> + Send, | ||
| NA: AsNeighbor<Id = DP::InternalId>, | ||
| { | ||
| async move { |
There was a problem hiding this comment.
The future produced here captures itr across await points in the loop. This forces itr (and its internal iterator state) to be Send, which can be problematic for iterators that hold non-Send guards/borrows (e.g., iterators over concurrent maps), and can also unintentionally keep internal locks/borrows alive across .await. A robust approach is to eagerly materialize IDs into an owned Vec<DP::InternalId> before constructing the async block (so the returned future only captures a Vec, which is Send), and then iterate that Vec inside the async block; doing so also lets you relax/remove the Itr: Send / IntoIter: Send bounds.
| let mut total_live_points = 0; | ||
|
|
||
| let mut neighbors = AdjacencyList::with_capacity(self.max_degree_with_slack()); | ||
| for id in &self.data_provider { | ||
| for id in itr { | ||
| total_live_points += 1; | ||
| accessor.get_neighbors(id, &mut neighbors).await?; |
There was a problem hiding this comment.
The future produced here captures itr across await points in the loop. This forces itr (and its internal iterator state) to be Send, which can be problematic for iterators that hold non-Send guards/borrows (e.g., iterators over concurrent maps), and can also unintentionally keep internal locks/borrows alive across .await. A robust approach is to eagerly materialize IDs into an owned Vec<DP::InternalId> before constructing the async block (so the returned future only captures a Vec, which is Send), and then iterate that Vec inside the async block; doing so also lets you relax/remove the Itr: Send / IntoIter: Send bounds.
| let mut total_live_points = 0; | ||
|
|
||
| let mut neighbors = AdjacencyList::with_capacity(self.max_degree_with_slack()); | ||
| for id in &self.data_provider { | ||
| for id in itr { | ||
| total_live_points += 1; | ||
| accessor.get_neighbors(id, &mut neighbors).await?; |
There was a problem hiding this comment.
With the new signature, callers can pass an empty iterator. Please ensure the remainder of the implementation has a defined behavior for “no IDs” (e.g., return Err with a clear message, or return a DegreeStats with zeros) and won’t divide by zero / produce NaNs for avg_degree. Adding a unit test that passes an empty iterator would lock in the intended behavior.
This is a quick cleanup PR for some of the changes that went in the other day.
In test/cases/index.rs there were a bunch of paged_search tests that on the whole are lower signal than the baseline-related paged_search tests added in a different PR, so those are being removed.
tests for the QueryLabelProvider were moved to the trait defn location for additional documentation purposes on how we intend the trait to be used.
In the initial test authoring, I added an IntoIterator trait implementation to the test provider in the goals of testing get_degree_stats. After syncing with Mark, we agreed that the better approach was to change the way that get_degree_stats behaves, now it accepts an iterator to the ids that you want the degree stats for rather than forcing the provider to implement IntoIterator.
There are still some areas that I'm cleaning up. In particular: helpers.rs, while it does help to set up some tests, naming is not the best and there are still some layers of boilerplate that can be eliminated by spending a bit more time handcrafting a solution.