Skip to content

First part of a HashSortedMap#107

Open
aneubeck wants to merge 23 commits intomainfrom
aneubeck/prefixmap
Open

First part of a HashSortedMap#107
aneubeck wants to merge 23 commits intomainfrom
aneubeck/prefixmap

Conversation

@aneubeck
Copy link
Copy Markdown
Collaborator

In a subsequent step, merging functions and sorted iterators will be added.

Copilot AI review requested due to automatic review settings April 29, 2026 13:25
@aneubeck aneubeck requested a review from a team as a code owner April 29, 2026 13:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a new hash-sorted-map crate implementing the first part of an insertion-only HashSortedMap with SIMD-accelerated group scanning and overflow chaining, plus accompanying benchmarks and design documentation.

Changes:

  • Adds core HashSortedMap implementation (insert/get/entry API) with growth and overflow chaining.
  • Adds platform-specific group scanning primitives (SSE2/NEON/scalar) and a basic test suite.
  • Adds a Criterion benchmarks crate and initial README/optimization docs.
Show a summary per file
File Description
crates/hash-sorted-map/src/lib.rs Exposes group_ops and hash_sorted_map modules.
crates/hash-sorted-map/src/hash_sorted_map.rs Implements the HashSortedMap data structure, growth logic, and Entry API, plus tests.
crates/hash-sorted-map/src/group_ops.rs Adds SIMD/scalar helpers for ctrl-byte matching and mask iteration.
crates/hash-sorted-map/benchmarks/performance.rs Adds Criterion benchmarks comparing multiple map implementations.
crates/hash-sorted-map/benchmarks/lib.rs Benchmark utilities (trigram generation, identity hasher).
crates/hash-sorted-map/benchmarks/Cargo.toml Defines the benchmarks crate and its dependencies.
crates/hash-sorted-map/README.md Documents motivation/design and includes benchmark results + running instructions.
crates/hash-sorted-map/OPTIMIZATIONS.md Design/optimization analysis and rationale.
crates/hash-sorted-map/Cargo.toml Declares the new hash-sorted-map crate metadata.
Cargo.toml Adds the benchmarks crate to the workspace members.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (3)

crates/hash-sorted-map/src/hash_sorted_map.rs:560

  • insert_after_grow treats Insertion::NeedsOverflow as unreachable after a resize, but it can still happen for adversarial / extremely-colliding hash distributions (the table may still need overflow groups, even immediately after growing). Instead of unreachable!, handle NeedsOverflow by allocating/linking an overflow group (with the same capacity check as the normal insert path) or by retrying via the normal insert_hashed logic.
        // After grow, the new primary group for `key` cannot be full (see
        // function docs), and the key wasn't in the table before grow.
        FindResult::Vacant(Insertion::NeedsOverflow { .. }) | FindResult::Found(_) => {
            unreachable!("post-grow walk must hit an empty slot")
        }

crates/hash-sorted-map/README.md:90

  • The benchmark invocation doesn’t match the benchmark that’s defined in benchmarks/Cargo.toml ([[bench]] name = "performance"). cargo bench --bench hashmap_insert will fail; update the README to the correct command (likely cargo bench -p hash-sorted-map-benchmarks --bench performance, or run from the benchmarks/ crate).

```sh
cargo bench --bench hashmap_insert
**crates/hash-sorted-map/src/hash_sorted_map.rs:367**
* `insert_for_grow` allocates overflow groups without checking `self.num_groups` against `self.groups.len()`. Under heavy hash collisions during rehash, `new_gi` can reach `self.groups.len()` and the subsequent indexing will panic. Consider mirroring the capacity check used in `insert_hashed` (e.g., trigger another `grow()` / allocate more groups) so growth remains panic-free even for pathological hash distributions.
        } else {
            let new_gi = self.num_groups as usize;
            self.groups[gi].overflow = new_gi as u32;
            self.num_groups += 1;
            group = &mut self.groups[new_gi];
            break;
</details>


- **Files reviewed:** 10/10 changed files
- **Comments generated:** 3


Comment on lines +34 to +40
│ Vec<Group<K,V>> where each Group (AoS): │
│ { ctrl: [u8; 8], keys: [MaybeUninit<K>; 8], │
│ values: [MaybeUninit<V>; 8], overflow: u32 } │
│ │
│ • Overflow chaining (linked groups) │
│ • 8-byte groups with NEON/SSE2/scalar SIMD scan │
│ • EMPTY / FULL tag states only (insertion-only, no deletion) │
Copy link

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This architecture diagram hard-codes 8-slot groups (ctrl: [u8; 8], keys: ...; 8], etc.) and says “8-byte groups with NEON/SSE2”, but the implementation uses GROUP_SIZE = 16 on x86_64. Please update the documentation to reflect the 8-or-16 group size (or describe it as GROUP_SIZE).

Copilot uses AI. Check for mistakes.
Comment thread crates/hash-sorted-map/benchmarks/Cargo.toml
Comment thread crates/hash-sorted-map/src/hash_sorted_map.rs Outdated
Copy link
Copy Markdown
Contributor

@jorendorff jorendorff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool.

Can I see the merge/iteration operations you have in mind, before stamping this?

Comment thread crates/hash-sorted-map/src/lib.rs Outdated
scanning the group. Gives a direct hit on most inserts at low load.
- **SIMD group scanning** — uses NEON on aarch64, SSE2 on x86\_64, and a
scalar fallback elsewhere to scan 8–16 control bytes in parallel.
- **AoS group layout** — each group stores its control bytes, keys, and values
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest linking Wikipedia here: https://en.wikipedia.org/wiki/AoS_and_SoA#Array_of_structures_of_arrays

Wikipedia calls this AoSoA (since each Group is a struct of arrays) or "tiled AoS"

}

pub fn with_capacity_and_hasher(capacity: usize, hash_builder: S) -> Self {
let adjusted = capacity.checked_mul(8).unwrap_or(usize::MAX) / 7;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to adjust more than that, because the expected load factor is not as high as hashbrown.

I did a few simulations and we need to multiply by about 1.30 here, or 1.42 if GROUP_SIZE is 8. That is assuming alloc_groups overallocates by a further 1/8 and that we want with_capacity to be an honest effort to accommodate that many entries without growing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
(I had copilot compute those numbers as well, or at least something similar, just forgot to change the code)

Co-authored-by: Jason Orendorff <jorendorff@github.com>
@aneubeck
Copy link
Copy Markdown
Collaborator Author

Here is the very simple sorting of each group:
https://github.com/github/blackbird/blob/aneubeck/reindex4/crates/index/src/indexer/prefix_map.rs#L288
and here the sorted iterator:
https://github.com/github/blackbird/blob/aneubeck/reindex4/crates/index/src/indexer/prefix_map.rs#L390

In the ported version, we will need to introduce a new type, since the hash map will be broken after the sorting.
But one might want to iterate multiple times.

Merging is implemented here:
https://github.com/github/blackbird/blob/aneubeck/reindex4/crates/index/src/indexer/memory_index.rs#L191-L204

We could provide a helper function in the rust crate for this.

Feel free to merge/make changes, since I will be OOO tomorrow.

/// View into a vacant entry. Holds the borrow of the map plus the hash, key,
/// and pre-computed insertion slot.
pub struct VacantEntry<'a, K, V, S> {
map: &'a mut HashSortedMap<K, V, S>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OccupiedEntry is fine, but modifying memory directly owned by this HashSortedMap from another mut pointer (group/tail), while this mut reference exists, is UB.

The annoying workaround is like

Suggested change
map: &'a mut HashSortedMap<K, V, S>,
phantom: PhantomData<&'a mut HashSortedMap<K, V, S>>,
map: *mut HashSortedMap<K, V, S>,

and only make mut references via one of the two pointers at a time, which means you have to alternate.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea whether I did the right thing now... I simply added (*map) dereferencing code to the insert implementation

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should insertion also have the same lifetime (via a phantomdata)?

Comment thread crates/hash-sorted-map/src/hash_sorted_map.rs Outdated
Comment thread crates/hash-sorted-map/src/hash_sorted_map.rs Outdated
```

hashbrown does **not** have this optimization — it always does a full SIMD
group scan. At ~50% load, the hint hits ~58% of the time, avoiding the scan
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this possible? If the key being inserted has a random hash, then at X% load it should hit exactly X% of the time.

@@ -0,0 +1,176 @@
# HashSortedMap vs. Rust Swiss Table (hashbrown): Optimization Analysis

## Executive Summary
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI slop. :P It's honestly better than nothing, but I'd be happier if you did a read through this and just deleted everything that's not something you would say as a human.

Comment on lines +137 to +139
traversing overflow chains. The `m/8` default implicitly enforces ~62.5% max
load, which aligns with the mathematical analysis (Poisson model, 3σ
confidence).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...What mathematical analysis? Is that load number correct? It's different from what the code says.

aneubeck and others added 3 commits May 5, 2026 15:23
Co-authored-by: Jason Orendorff <jorendorff@github.com>
Co-authored-by: Jason Orendorff <jorendorff@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants