Skip to content

k12: new implementation with parallel processing support#839

Open
newpavlov wants to merge 1 commit intomasterfrom
k12/reimpl
Open

k12: new implementation with parallel processing support#839
newpavlov wants to merge 1 commit intomasterfrom
k12/reimpl

Conversation

@newpavlov
Copy link
Copy Markdown
Member

@newpavlov newpavlov commented Apr 19, 2026

New implementation supports parallel processing of data and consumes data as soon as possible, which allows to remove the 8 KiB buffer used previously. It also introduces separate customized types with owned and borrowed customization strings, while the base type can not be customized.

Potential additional improvements:

  • Optimize absorption of chaining values by handling u64s instead of going through the byte buffer.
  • Use custom buffering based on consumed_len instead of separate cursors.

Closes #797

@newpavlov newpavlov requested a review from tarcieri April 19, 2026 18:21
@newpavlov
Copy link
Copy Markdown
Member Author

newpavlov commented Apr 19, 2026

Benchmark results on my x86 laptop:

$ cargo bench

test kt128_1_10   ... bench:          51.75 ns/iter (+/- 4.34) = 196 MB/s
test kt128_2_100  ... bench:         146.51 ns/iter (+/- 10.80) = 684 MB/s
test kt128_3_1k   ... bench:       1,068.34 ns/iter (+/- 12.90) = 936 MB/s
test kt128_4_10k  ... bench:       9,993.64 ns/iter (+/- 66.05) = 1000 MB/s
test kt128_5_100k ... bench:      99,155.64 ns/iter (+/- 462.52) = 1008 MB/s
test kt128_6_1m   ... bench:     994,643.40 ns/iter (+/- 338,104.49) = 1005 MB/s

test kt256_1_10   ... bench:          55.38 ns/iter (+/- 7.04) = 181 MB/s
test kt256_2_100  ... bench:         171.64 ns/iter (+/- 6.93) = 584 MB/s
test kt256_3_1k   ... bench:       1,310.92 ns/iter (+/- 20.87) = 763 MB/s
test kt256_4_10k  ... bench:      12,423.70 ns/iter (+/- 2,534.85) = 804 MB/s
test kt256_5_100k ... bench:     123,474.30 ns/iter (+/- 880.00) = 809 MB/s
test kt256_6_1m   ... bench:   1,236,551.90 ns/iter (+/- 15,772.23) = 808 MB/s
$ RUSTFLAGS='--cfg keccak_backend="simd256" -C target-cpu=native' cargo bench

test kt128_1_10   ... bench:          53.88 ns/iter (+/- 9.00) = 188 MB/s
test kt128_2_100  ... bench:         137.11 ns/iter (+/- 20.29) = 729 MB/s
test kt128_3_1k   ... bench:         934.46 ns/iter (+/- 8.60) = 1070 MB/s
test kt128_4_10k  ... bench:       8,807.24 ns/iter (+/- 45.38) = 1135 MB/s
test kt128_5_100k ... bench:      61,022.62 ns/iter (+/- 535.21) = 1638 MB/s
test kt128_6_1m   ... bench:     515,211.50 ns/iter (+/- 4,925.42) = 1940 MB/s

test kt256_1_10   ... bench:          51.96 ns/iter (+/- 1.60) = 196 MB/s
test kt256_2_100  ... bench:         158.18 ns/iter (+/- 16.60) = 632 MB/s
test kt256_3_1k   ... bench:       1,155.59 ns/iter (+/- 16.73) = 865 MB/s
test kt256_4_10k  ... bench:      10,914.95 ns/iter (+/- 72.48) = 916 MB/s
test kt256_5_100k ... bench:      76,398.50 ns/iter (+/- 15,408.53) = 1308 MB/s
test kt256_6_1m   ... bench:     631,593.60 ns/iter (+/- 10,271.26) = 1583 MB/s

On M4:

$ RUSTFLAGS='--cfg keccak_backend="soft"' cargo bench

test kt128_1_10   ... bench:          12.64 ns/iter (+/- 0.10) = 833 MB/s
test kt128_2_100  ... bench:          53.46 ns/iter (+/- 0.73) = 1886 MB/s
test kt128_3_1k   ... bench:         458.52 ns/iter (+/- 2.21) = 2183 MB/s
test kt128_4_10k  ... bench:       4,476.62 ns/iter (+/- 36.82) = 2234 MB/s
test kt128_5_100k ... bench:      44,528.55 ns/iter (+/- 200.51) = 2245 MB/s
test kt128_6_1m   ... bench:     446,610.45 ns/iter (+/- 2,878.20) = 2239 MB/s

test kt256_1_10   ... bench:          13.79 ns/iter (+/- 0.08) = 769 MB/s
test kt256_2_100  ... bench:          64.38 ns/iter (+/- 0.31) = 1562 MB/s
test kt256_3_1k   ... bench:         569.71 ns/iter (+/- 6.26) = 1757 MB/s
test kt256_4_10k  ... bench:       5,594.98 ns/iter (+/- 197.50) = 1787 MB/s
test kt256_5_100k ... bench:      55,818.75 ns/iter (+/- 387.30) = 1791 MB/s
test kt256_6_1m   ... bench:     560,366.60 ns/iter (+/- 7,578.58) = 1784 MB/s
$  RUSTFLAGS='--cfg keccak_backend="simd128"' cargo bench

test kt128_1_10   ... bench:          13.82 ns/iter (+/- 0.08) = 769 MB/s
test kt128_2_100  ... bench:          58.81 ns/iter (+/- 0.26) = 1724 MB/s
test kt128_3_1k   ... bench:         506.57 ns/iter (+/- 3.22) = 1976 MB/s
test kt128_4_10k  ... bench:       4,962.32 ns/iter (+/- 26.99) = 2015 MB/s
test kt128_5_100k ... bench:      31,671.00 ns/iter (+/- 677.57) = 3157 MB/s
test kt128_6_1m   ... bench:     293,570.83 ns/iter (+/- 2,292.37) = 3406 MB/s

test kt256_1_10   ... bench:          15.38 ns/iter (+/- 0.14) = 666 MB/s
test kt256_2_100  ... bench:          71.71 ns/iter (+/- 0.75) = 1408 MB/s
test kt256_3_1k   ... bench:         631.36 ns/iter (+/- 6.75) = 1584 MB/s
test kt256_4_10k  ... bench:       6,211.28 ns/iter (+/- 133.55) = 1610 MB/s
test kt256_5_100k ... bench:      39,537.72 ns/iter (+/- 159.29) = 2529 MB/s
test kt256_6_1m   ... bench:     367,704.15 ns/iter (+/- 5,421.26) = 2719 MB/s
$ cargo bench

test kt128_1_10   ... bench:          11.28 ns/iter (+/- 0.10) = 909 MB/s
test kt128_2_100  ... bench:          50.47 ns/iter (+/- 0.74) = 2000 MB/s
test kt128_3_1k   ... bench:         441.50 ns/iter (+/- 5.53) = 2267 MB/s
test kt128_4_10k  ... bench:       4,344.07 ns/iter (+/- 26.11) = 2302 MB/s
test kt128_5_100k ... bench:      24,430.39 ns/iter (+/- 630.76) = 4093 MB/s
test kt128_6_1m   ... bench:     219,602.60 ns/iter (+/- 1,010.68) = 4553 MB/s

test kt256_1_10   ... bench:          12.60 ns/iter (+/- 0.13) = 833 MB/s
test kt256_2_100  ... bench:          62.66 ns/iter (+/- 2.25) = 1612 MB/s
test kt256_3_1k   ... bench:         563.14 ns/iter (+/- 2.60) = 1776 MB/s
test kt256_4_10k  ... bench:       5,588.23 ns/iter (+/- 72.13) = 1789 MB/s
test kt256_5_100k ... bench:      31,290.14 ns/iter (+/- 353.19) = 3195 MB/s
test kt256_6_1m   ... bench:     272,802.07 ns/iter (+/- 2,114.30) = 3665 MB/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

k12: use parallel processing capabilities implemented in keccak

1 participant