Skip to content

branch-4.1: [Improvement](join) add HashCRC32Return32 and make join_hash_table use 32-bit hash value #62512#62813

Open
github-actions[bot] wants to merge 1 commit intobranch-4.1from
auto-pick-62512-branch-4.1
Open

branch-4.1: [Improvement](join) add HashCRC32Return32 and make join_hash_table use 32-bit hash value #62512#62813
github-actions[bot] wants to merge 1 commit intobranch-4.1from
auto-pick-62512-branch-4.1

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

Cherry-picked from #62512

@github-actions github-actions Bot requested a review from yiguolei as a code owner April 24, 2026 08:54
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Apr 24, 2026
@hello-stephen
Copy link
Copy Markdown
Contributor

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (60/60) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 54.54% (20070/36798)
Line Coverage 35.50% (181730/511903)
Region Coverage 29.76% (133429/448358)
Branch Coverage 31.28% (59375/189819)

…e 32-bit hash value (#62512)

```
after:
               - InitProbeSideTime: 85.747ms
               - ProbeIntermediateRows: 85.52771M (85527710)

before:
               - InitProbeSideTime: 102.746ms
               - ProbeIntermediateRows: 85.52771M (85527710)
```

This pull request introduces a new CRC32-based hashing implementation
that consistently returns a 32-bit hash value, and updates the join hash
table infrastructure to use this new implementation for primary, fixed,
and string key types. The changes are focused on improving hash
consistency and efficiency by leveraging type-specific CRC32 intrinsics.

The most important changes are:

### New CRC32 Hash Implementation

* Added a new header file `hash_crc32_return32.h` that implements
type-specialized CRC32 hash functions returning `uint32_t`, using the
narrowest possible intrinsic for each type and supporting a wide range
of key types including arithmetic, 128/256-bit, and compound types.

### Hash Table Infrastructure Updates

* Updated the hash type used in `JoinHashTable` and related hash table
context typedefs to use the new `HashCRC32Return32` instead of the
previous `HashCRC32`, ensuring all hash values are consistently 32 bits.
[[1]](diffhunk://#diff-50a41f44edcbb4571a6e14c6440f7a53fdd2e68f1c95dc51fa1fc906610db5f3L43-R43)
[[2]](diffhunk://#diff-253e23eff6f1a66e419e353770f4d0cb9fbdaa804f254f15ced17119840e7a77L74-R88)

### Build and Include Adjustments

* Included the new `hash_crc32_return32.h` header in `join_utils.h` to
make the new hash function available to the relevant code.
@morningman
Copy link
Copy Markdown
Contributor

run buildall

@morningman morningman force-pushed the auto-pick-62512-branch-4.1 branch from 168699d to 118c136 Compare April 24, 2026 20:22
@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (60/60) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.30% (20031/37583)
Line Coverage 36.72% (188698/513847)
Region Coverage 33.10% (146820/443508)
Branch Coverage 34.15% (64172/187889)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (60/60) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 70.95% (26111/36803)
Line Coverage 53.05% (271646/512030)
Region Coverage 46.62% (209078/448520)
Branch Coverage 49.91% (94763/189873)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants