Skip to content

Call take arrays once per repartitioned input batch#22159

Open
gene-bordegaray wants to merge 2 commits into
apache:mainfrom
gene-bordegaray:gene.bordegaray/2026/05/repartition-grouped-hash-take
Open

Call take arrays once per repartitioned input batch#22159
gene-bordegaray wants to merge 2 commits into
apache:mainfrom
gene-bordegaray:gene.bordegaray/2026/05/repartition-grouped-hash-take

Conversation

@gene-bordegaray
Copy link
Copy Markdown
Contributor

@gene-bordegaray gene-bordegaray commented May 13, 2026

Which issue does this PR close?

Rationale for this change

Hash repartition currently builds one output batch per non-empty target partition by calling take_arrays separately for each partition. At high fanout this means an input batch can issue many take kernels, which shows in repartition-heavy queries.

This changes hash repartition to concatenate the per-partition row indices, call take_arrays once for the input batch, and then slice the reordered batch back into per-partition output batches.

This is complementary to #22010: that PR reduces channel/gate traffic from many small batches, while this PR reduces the Arrow take-kernel work required to create the repartitioned batches.

What changes are included in this PR?

  • Replaces per-partition hash repartition take_arrays calls with one grouped take_arrays call per input batch.
  • Tracks partition ranges into the grouped reordered batch and returns zero-copy RecordBatch::slice outputs for each non-empty partition.

How the grouped take works:

input rows:        0   1   2   3   4   5   6

partition 0:      [2, 5]
partition 1:      []
partition 2:      [0, 3, 4]
partition 3:      [1, 6]

grouped indices:  [2, 5, 0, 3, 4, 1, 6]
partition ranges: [(0, start=0, len=2),
                   (2, start=2, len=3),
                   (3, start=5, len=2)]

take once:        rows [2, 5, 0, 3, 4, 1, 6]
slice outputs:    partition 0 = slice(0, 2)
                  partition 2 = slice(2, 3)
                  partition 3 = slice(5, 2)

Are these changes tested?

  • cargo test -p datafusion-physical-plan repartition --lib

Benchmarks:

Default TPCH SF10 summary, with no --batch-size override:

Partitions main total ms grouped total ms change wins losses biggest win biggest loss
8 6234.28 6149.98 -1.35% 10 12 Q3 -22.12% Q21 5.47%
16 5602.63 5427.40 -3.13% 18 4 Q21 -10.67% Q10 4.47%
32 6097.10 5738.12 -5.89% 20 2 Q8 -10.45% Q6 0.47%
64 7194.70 6693.30 -6.97% 15 7 Q21 -15.92% Q1 5.02%
300 26276.60 23701.32 -9.80% 19 3 Q21 -24.24% Q1 9.45%
TPCH SF10 default batch size, 8 partitions, all queries
Query main ms grouped ms change speedup
Q1 411.40 356.46 -13.35% 1.154x
Q2 117.97 95.68 -18.89% 1.233x
Q3 278.88 217.20 -22.12% 1.284x
Q4 104.92 98.47 -6.15% 1.065x
Q5 311.36 302.12 -2.97% 1.031x
Q6 133.82 135.13 0.98% 0.990x
Q7 366.32 360.50 -1.59% 1.016x
Q8 408.86 391.14 -4.33% 1.045x
Q9 509.62 515.74 1.20% 0.988x
Q10 278.97 273.80 -1.85% 1.019x
Q11 76.55 75.34 -1.58% 1.016x
Q12 175.56 180.87 3.03% 0.971x
Q13 211.03 221.03 4.74% 0.955x
Q14 183.05 189.95 3.77% 0.964x
Q15 318.46 328.85 3.26% 0.968x
Q16 57.66 57.70 0.08% 0.999x
Q17 505.05 503.77 -0.25% 1.003x
Q18 596.49 602.77 1.05% 0.990x
Q19 273.60 283.41 3.58% 0.965x
Q20 255.73 267.30 4.52% 0.957x
Q21 592.44 624.83 5.47% 0.948x
Q22 66.54 67.91 2.05% 0.980x
TPCH SF10 default batch size, 16 partitions, all queries
Query main ms grouped ms change speedup
Q1 278.95 276.85 -0.76% 1.008x
Q2 95.84 94.86 -1.02% 1.010x
Q3 205.24 195.05 -4.97% 1.052x
Q4 93.60 90.58 -3.23% 1.033x
Q5 292.34 289.93 -0.82% 1.008x
Q6 103.59 106.02 2.35% 0.977x
Q7 370.86 365.40 -1.47% 1.015x
Q8 392.01 364.63 -6.98% 1.075x
Q9 512.66 483.45 -5.70% 1.060x
Q10 246.01 257.01 4.47% 0.957x
Q11 73.33 70.21 -4.26% 1.045x
Q12 143.58 143.53 -0.04% 1.000x
Q13 193.30 193.57 0.14% 0.999x
Q14 157.01 146.14 -6.92% 1.074x
Q15 255.58 258.73 1.23% 0.988x
Q16 58.90 57.14 -2.97% 1.031x
Q17 515.22 491.16 -4.67% 1.049x
Q18 518.50 516.90 -0.31% 1.003x
Q19 212.28 211.63 -0.30% 1.003x
Q20 229.34 224.54 -2.09% 1.021x
Q21 594.29 530.87 -10.67% 1.119x
Q22 60.20 59.18 -1.70% 1.017x
TPCH SF10 default batch size, 32 partitions, all queries
Query main ms grouped ms change speedup
Q1 294.06 286.80 -2.47% 1.025x
Q2 110.59 103.55 -6.37% 1.068x
Q3 227.86 215.08 -5.61% 1.059x
Q4 110.99 102.77 -7.41% 1.080x
Q5 347.34 319.49 -8.02% 1.087x
Q6 109.39 109.91 0.47% 0.995x
Q7 423.98 385.42 -9.09% 1.100x
Q8 428.83 384.00 -10.45% 1.117x
Q9 559.81 510.72 -8.77% 1.096x
Q10 269.17 265.02 -1.54% 1.016x
Q11 85.99 80.21 -6.72% 1.072x
Q12 153.67 150.36 -2.15% 1.022x
Q13 197.90 188.72 -4.64% 1.049x
Q14 161.28 153.93 -4.55% 1.048x
Q15 259.92 261.01 0.42% 0.996x
Q16 64.90 63.70 -1.85% 1.019x
Q17 574.18 531.76 -7.39% 1.080x
Q18 572.23 538.68 -5.86% 1.062x
Q19 227.95 220.19 -3.41% 1.035x
Q20 232.25 225.80 -2.78% 1.029x
Q21 622.54 579.94 -6.84% 1.073x
Q22 62.29 61.06 -1.98% 1.020x
TPCH SF10 default batch size, 64 partitions, all queries
Query main ms grouped ms change speedup
Q1 285.10 299.41 5.02% 0.952x
Q2 161.42 153.51 -4.90% 1.052x
Q3 297.85 272.09 -8.65% 1.095x
Q4 147.28 140.69 -4.47% 1.047x
Q5 428.29 381.28 -10.98% 1.123x
Q6 106.27 108.41 2.02% 0.980x
Q7 494.50 443.89 -10.23% 1.114x
Q8 507.01 446.69 -11.90% 1.135x
Q9 667.11 624.78 -6.34% 1.068x
Q10 294.91 299.05 1.40% 0.986x
Q11 112.17 104.79 -6.57% 1.070x
Q12 168.23 166.87 -0.81% 1.008x
Q13 198.74 196.30 -1.23% 1.012x
Q14 175.31 177.74 1.39% 0.986x
Q15 265.68 265.71 0.01% 1.000x
Q16 85.69 82.93 -3.22% 1.033x
Q17 691.32 629.09 -9.00% 1.099x
Q18 697.85 617.88 -11.46% 1.129x
Q19 237.00 243.78 2.86% 0.972x
Q20 272.55 278.57 2.21% 0.978x
Q21 813.29 683.82 -15.92% 1.189x
Q22 87.14 76.04 -12.73% 1.146x
TPCH SF10 default batch size, 300 partitions, all queries
Query main ms grouped ms change speedup
Q1 277.84 304.08 9.45% 0.914x
Q2 1303.10 1268.65 -2.64% 1.027x
Q3 1496.14 1393.15 -6.88% 1.074x
Q4 681.20 652.85 -4.16% 1.043x
Q5 1680.43 1469.91 -12.53% 1.143x
Q6 100.65 105.75 5.07% 0.952x
Q7 1880.26 1652.83 -12.10% 1.138x
Q8 1956.81 1760.72 -10.02% 1.111x
Q9 1787.75 1454.84 -18.62% 1.229x
Q10 1334.62 1296.02 -2.89% 1.030x
Q11 1018.99 994.13 -2.44% 1.025x
Q12 768.97 780.88 1.55% 0.985x
Q13 671.88 638.51 -4.97% 1.052x
Q14 603.10 586.19 -2.80% 1.029x
Q15 302.19 295.28 -2.29% 1.023x
Q16 597.64 585.39 -2.05% 1.021x
Q17 1963.57 1712.46 -12.79% 1.147x
Q18 1942.46 1634.96 -15.83% 1.188x
Q19 818.85 808.19 -1.30% 1.013x
Q20 1499.39 1468.55 -2.06% 1.021x
Q21 3006.35 2277.49 -24.24% 1.320x
Q22 584.39 560.47 -4.09% 1.043x

Stress cases:

  • These runs use --batch-size 1024 to stress the repartition path. They are included to show the mechanism under smaller input batches and higher output fanout, not as the primary end-to-end performance claim.
TPCH SF10, 8 partitions, all queries
Query main ms grouped ms change speedup
Q1 640.78 451.82 -29.49% 1.420x
Q2 315.81 150.07 -52.48% 2.100x
Q3 899.21 375.88 -58.20% 2.390x
Q4 469.31 217.07 -53.75% 2.160x
Q5 1131.37 446.36 -60.55% 2.530x
Q6 376.40 163.66 -56.52% 2.300x
Q7 1388.40 484.36 -65.11% 2.870x
Q8 1369.67 571.62 -58.27% 2.400x
Q9 1834.81 739.88 -59.68% 2.480x
Q10 813.73 361.94 -55.52% 2.250x
Q11 267.06 114.84 -57.00% 2.330x
Q12 526.41 250.39 -52.43% 2.100x
Q13 760.54 324.78 -57.30% 2.340x
Q14 446.91 221.04 -50.54% 2.020x
Q15 764.64 375.67 -50.87% 2.040x
Q16 167.74 80.36 -52.09% 2.090x
Q17 1801.72 763.58 -57.62% 2.360x
Q18 3303.89 1649.87 -50.06% 2.000x
Q19 694.16 354.97 -48.86% 1.960x
Q20 693.91 323.17 -53.43% 2.150x
Q21 3112.83 1065.36 -65.78% 2.920x
Q22 205.88 95.97 -53.38% 2.150x
TPCH SF10, 16 partitions, all queries
Query main ms grouped ms change speedup
Q1 518.84 328.66 -36.66% 1.580x
Q2 350.47 148.64 -57.59% 2.360x
Q3 1003.55 371.04 -63.03% 2.700x
Q4 589.70 258.05 -56.24% 2.290x
Q5 1343.64 506.01 -62.34% 2.660x
Q6 322.21 130.93 -59.37% 2.460x
Q7 1527.85 550.88 -63.94% 2.770x
Q8 1476.46 578.77 -60.80% 2.550x
Q9 2091.16 785.54 -62.44% 2.660x
Q10 817.98 331.02 -59.53% 2.470x
Q11 341.46 123.31 -63.89% 2.770x
Q12 493.51 221.61 -55.10% 2.230x
Q13 690.52 290.54 -57.92% 2.380x
Q14 410.54 171.45 -58.24% 2.390x
Q15 733.96 290.56 -60.41% 2.530x
Q16 197.35 86.09 -56.37% 2.290x
Q17 2089.12 828.96 -60.32% 2.520x
Q18 2712.00 1097.77 -59.52% 2.470x
Q19 602.77 260.74 -56.74% 2.310x
Q20 661.20 288.58 -56.35% 2.290x
Q21 5490.50 1151.50 -79.03% 4.770x
Q22 198.38 103.13 -48.01% 1.920x
TPCH SF10, 32 partitions, all queries
Query main ms grouped ms change speedup
Q1 533.86 338.54 -36.59% 1.580x
Q2 439.59 199.50 -54.62% 2.200x
Q3 1242.19 510.11 -58.93% 2.440x
Q4 743.92 363.33 -51.16% 2.050x
Q5 1711.97 666.50 -61.07% 2.570x
Q6 325.39 134.07 -58.80% 2.430x
Q7 1947.59 722.22 -62.92% 2.700x
Q8 1914.31 775.62 -59.48% 2.470x
Q9 2662.07 976.47 -63.32% 2.730x
Q10 902.80 362.71 -59.82% 2.490x
Q11 400.93 170.81 -57.40% 2.350x
Q12 572.19 265.06 -53.68% 2.160x
Q13 736.31 296.82 -59.69% 2.480x
Q14 430.11 180.93 -57.93% 2.380x
Q15 732.36 327.12 -55.33% 2.240x
Q16 245.97 116.24 -52.74% 2.120x
Q17 2711.18 1100.17 -59.42% 2.460x
Q18 2946.70 1176.02 -60.09% 2.510x
Q19 600.47 258.58 -56.94% 2.320x
Q20 765.20 337.01 -55.96% 2.270x
Q21 10062.70 1534.95 -84.75% 6.560x
Q22 250.50 128.27 -48.79% 1.950x
TPCH SF10, 64 partitions, all queries
Query main ms grouped ms change speedup
Q1 595.70 324.74 -45.49% 1.830x
Q2 663.08 305.30 -53.96% 2.170x
Q3 1744.90 727.81 -58.29% 2.400x
Q4 1070.72 566.20 -47.12% 1.890x
Q5 2447.07 938.91 -61.63% 2.610x
Q6 315.47 132.73 -57.93% 2.380x
Q7 2807.33 1004.83 -64.21% 2.790x
Q8 2674.51 1069.64 -60.01% 2.500x
Q9 3777.94 1424.08 -62.31% 2.650x
Q10 1086.91 469.38 -56.82% 2.320x
Q11 575.59 264.02 -54.13% 2.180x
Q12 841.83 387.25 -54.00% 2.170x
Q13 867.57 379.90 -56.21% 2.280x
Q14 470.87 214.58 -54.43% 2.190x
Q15 762.07 340.55 -55.31% 2.240x
Q16 337.20 179.25 -46.84% 1.880x
Q17 3953.82 1701.46 -56.97% 2.320x
Q18 3763.51 1606.90 -57.30% 2.340x
Q19 644.43 314.27 -51.23% 2.050x
Q20 973.56 453.24 -53.45% 2.150x
Q21 19356.91 2396.96 -87.62% 8.080x
Q22 366.20 195.40 -46.64% 1.870x
TPCH SF10, 300 partitions, targeted high-fanout queries

Yes this is a real use case for fanout in distributed-datafusion

Query main ms grouped ms change speedup
Q3 2543.94 2250.91 -11.52% 1.130x
Q9 6495.22 4755.78 -26.78% 1.370x
Q10 1869.05 1709.18 -8.55% 1.090x
Q13 1238.63 1157.47 -6.55% 1.070x
Q15 461.51 446.25 -3.31% 1.030x
Q21 37810.29 5594.01 -85.21% 6.760x
Q22 1084.95 1058.74 -2.42% 1.020x
TPCH SF10, 300 partitions, peak RSS stress

Measured with /usr/bin/time -l, one iteration, --batch-size 1024, --partitions 300, and no DataFusion memory limit. RSS is process peak resident set size from the OS.

Query main ms grouped ms time change main peak RSS grouped peak RSS RSS change
Q7 5171.45 4151.15 -19.73% 3.69 GiB 3.75 GiB 1.61%
Q9 6055.57 4758.10 -21.43% 4.01 GiB 4.01 GiB 0.04%
Q21 36300.80 5810.14 -83.99% 2.96 GiB 2.05 GiB -30.79%

Memory concern and follow-up work

This PR changes output batches from materializing per-partition batches to slices of one reordered batch. This means sibling slices can share the same buffers.

Potential concern:

one reordered batch allocation
  -> slice for partition 0
  -> slice for partition 1
  -> slice for partition 2

A slow output partition can keep the shared reordered batch buffers alive until its slice is dropped. Also, RecordBatch::get_array_memory_size() may count shared slice buffers repeatedly when repartition reserves memory per output batch.

The peak RSS stress above did not show a process-memory regression in the measured queries. Follow-up work should add buffer-aware accounting.

Are there any user-facing changes?

No.

@gene-bordegaray gene-bordegaray changed the title Call take arrays once per repartitioned input batch [WIP] Call take arrays once per repartitioned input batch May 13, 2026
@rluvaton
Copy link
Copy Markdown
Member

run benchmarks

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4445097796-48-6zsv2 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing gene.bordegaray/2026/05/repartition-grouped-hash-take (a0a727c) to 937dfda (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4445097796-46-gq4hp 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing gene.bordegaray/2026/05/repartition-grouped-hash-take (a0a727c) to 937dfda (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4445097796-47-57b4m 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing gene.bordegaray/2026/05/repartition-grouped-hash-take (a0a727c) to 937dfda (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and gene.bordegaray_2026_05_repartition-grouped-hash-take
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query     ┃                           HEAD ┃ gene.bordegaray_2026_05_repartition-grouped-hash-take ┃    Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1  │ 38.90 / 40.27 ±1.15 / 41.71 ms │                        38.71 / 39.31 ±1.01 / 41.33 ms │ no change │
│ QQuery 2  │ 20.40 / 20.58 ±0.25 / 21.07 ms │                        20.48 / 20.76 ±0.23 / 21.17 ms │ no change │
│ QQuery 3  │ 35.94 / 37.72 ±1.23 / 39.11 ms │                        34.64 / 36.52 ±1.37 / 37.83 ms │ no change │
│ QQuery 4  │ 17.74 / 17.96 ±0.17 / 18.19 ms │                        17.52 / 17.74 ±0.17 / 17.95 ms │ no change │
│ QQuery 5  │ 43.38 / 44.60 ±1.30 / 46.57 ms │                        42.07 / 43.53 ±0.89 / 44.57 ms │ no change │
│ QQuery 6  │ 16.73 / 16.88 ±0.10 / 17.01 ms │                        16.53 / 16.75 ±0.18 / 17.05 ms │ no change │
│ QQuery 7  │ 51.50 / 52.60 ±1.54 / 55.46 ms │                        49.55 / 50.32 ±1.16 / 52.63 ms │ no change │
│ QQuery 8  │ 46.08 / 46.65 ±0.87 / 48.37 ms │                        45.36 / 45.47 ±0.11 / 45.67 ms │ no change │
│ QQuery 9  │ 50.21 / 52.23 ±1.17 / 53.52 ms │                        49.91 / 51.30 ±1.35 / 53.80 ms │ no change │
│ QQuery 10 │ 64.58 / 65.13 ±0.61 / 66.23 ms │                        64.17 / 64.48 ±0.37 / 65.19 ms │ no change │
│ QQuery 11 │ 13.60 / 14.41 ±1.19 / 16.78 ms │                        13.59 / 14.23 ±0.64 / 15.07 ms │ no change │
│ QQuery 12 │ 25.56 / 25.76 ±0.14 / 25.91 ms │                        24.91 / 25.20 ±0.28 / 25.63 ms │ no change │
│ QQuery 13 │ 35.71 / 36.34 ±0.71 / 37.41 ms │                        35.42 / 35.69 ±0.26 / 36.18 ms │ no change │
│ QQuery 14 │ 26.17 / 26.35 ±0.20 / 26.74 ms │                        25.55 / 25.75 ±0.12 / 25.92 ms │ no change │
│ QQuery 15 │ 31.89 / 32.14 ±0.19 / 32.46 ms │                        31.77 / 31.83 ±0.06 / 31.92 ms │ no change │
│ QQuery 16 │ 14.94 / 15.10 ±0.14 / 15.37 ms │                        14.64 / 15.00 ±0.20 / 15.19 ms │ no change │
│ QQuery 17 │ 75.49 / 77.04 ±1.61 / 79.89 ms │                        75.11 / 76.20 ±1.31 / 78.73 ms │ no change │
│ QQuery 18 │ 69.24 / 70.01 ±0.55 / 70.85 ms │                        66.46 / 67.50 ±1.01 / 68.86 ms │ no change │
│ QQuery 19 │ 35.50 / 35.80 ±0.32 / 36.40 ms │                        35.32 / 35.93 ±0.71 / 37.08 ms │ no change │
│ QQuery 20 │ 38.29 / 38.51 ±0.32 / 39.14 ms │                        37.73 / 37.92 ±0.17 / 38.15 ms │ no change │
│ QQuery 21 │ 59.88 / 62.82 ±1.95 / 65.81 ms │                        58.08 / 59.90 ±2.73 / 65.32 ms │ no change │
│ QQuery 22 │ 23.41 / 23.78 ±0.22 / 24.12 ms │                        23.43 / 23.70 ±0.24 / 24.12 ms │ no change │
└───────────┴────────────────────────────────┴───────────────────────────────────────────────────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Benchmark Summary                                                    ┃          ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ Total Time (HEAD)                                                    │ 852.68ms │
│ Total Time (gene.bordegaray_2026_05_repartition-grouped-hash-take)   │ 835.02ms │
│ Average Time (HEAD)                                                  │  38.76ms │
│ Average Time (gene.bordegaray_2026_05_repartition-grouped-hash-take) │  37.96ms │
│ Queries Faster                                                       │        0 │
│ Queries Slower                                                       │        0 │
│ Queries with No Change                                               │       22 │
│ Queries with Failure                                                 │        0 │
└──────────────────────────────────────────────────────────────────────┴──────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 5.0s
Peak memory 5.5 GiB
Avg memory 5.0 GiB
CPU user 31.3s
CPU sys 2.4s
Peak spill 0 B

tpch — branch

Metric Value
Wall time 5.0s
Peak memory 5.5 GiB
Avg memory 5.0 GiB
CPU user 30.6s
CPU sys 2.5s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and gene.bordegaray_2026_05_repartition-grouped-hash-take
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃ gene.bordegaray_2026_05_repartition-grouped-hash-take ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │           6.43 / 7.03 ±0.85 / 8.71 ms │                           6.18 / 6.70 ±0.87 / 8.44 ms │     no change │
│ QQuery 2  │        82.11 / 82.32 ±0.13 / 82.45 ms │                        80.93 / 81.38 ±0.29 / 81.78 ms │     no change │
│ QQuery 3  │        29.14 / 29.63 ±0.32 / 30.08 ms │                        28.58 / 29.04 ±0.46 / 29.65 ms │     no change │
│ QQuery 4  │     519.85 / 523.27 ±2.93 / 527.25 ms │                     510.72 / 520.55 ±5.32 / 525.73 ms │     no change │
│ QQuery 5  │        52.88 / 53.37 ±0.37 / 53.95 ms │                        52.55 / 53.25 ±0.60 / 54.32 ms │     no change │
│ QQuery 6  │        35.84 / 36.40 ±0.36 / 36.83 ms │                        35.65 / 35.96 ±0.26 / 36.30 ms │     no change │
│ QQuery 7  │     110.28 / 113.04 ±3.60 / 120.12 ms │                     107.64 / 110.60 ±3.09 / 115.50 ms │     no change │
│ QQuery 8  │        39.23 / 39.84 ±0.37 / 40.35 ms │                        38.77 / 39.43 ±0.46 / 40.20 ms │     no change │
│ QQuery 9  │        56.18 / 57.26 ±0.78 / 58.53 ms │                        53.42 / 54.31 ±0.86 / 55.86 ms │ +1.05x faster │
│ QQuery 10 │        80.76 / 81.78 ±1.16 / 83.88 ms │                        81.79 / 82.86 ±1.49 / 85.80 ms │     no change │
│ QQuery 11 │     314.31 / 321.51 ±4.31 / 325.69 ms │                     314.31 / 320.84 ±7.18 / 331.85 ms │     no change │
│ QQuery 12 │        29.24 / 29.52 ±0.21 / 29.77 ms │                        28.77 / 29.01 ±0.15 / 29.19 ms │     no change │
│ QQuery 13 │     128.87 / 130.17 ±1.07 / 131.59 ms │                     129.10 / 129.47 ±0.21 / 129.74 ms │     no change │
│ QQuery 14 │     515.93 / 517.75 ±1.57 / 520.45 ms │                     507.46 / 509.74 ±1.90 / 512.59 ms │     no change │
│ QQuery 15 │        61.05 / 62.15 ±0.86 / 63.47 ms │                        60.51 / 61.89 ±0.75 / 62.54 ms │     no change │
│ QQuery 16 │           6.94 / 7.11 ±0.14 / 7.34 ms │                           6.92 / 7.13 ±0.22 / 7.55 ms │     no change │
│ QQuery 17 │        84.24 / 85.87 ±1.20 / 87.28 ms │                        81.08 / 83.32 ±1.59 / 85.25 ms │     no change │
│ QQuery 18 │     157.16 / 158.59 ±1.77 / 162.08 ms │                     152.14 / 152.77 ±0.43 / 153.49 ms │     no change │
│ QQuery 19 │        41.55 / 41.79 ±0.26 / 42.21 ms │                        40.90 / 41.57 ±0.40 / 42.03 ms │     no change │
│ QQuery 20 │        35.16 / 35.77 ±0.33 / 36.13 ms │                        35.50 / 35.91 ±0.30 / 36.33 ms │     no change │
│ QQuery 21 │        17.86 / 18.06 ±0.19 / 18.41 ms │                        17.82 / 18.13 ±0.20 / 18.43 ms │     no change │
│ QQuery 22 │        62.31 / 62.91 ±0.56 / 63.79 ms │                        61.62 / 62.38 ±0.75 / 63.49 ms │     no change │
│ QQuery 23 │     476.45 / 482.37 ±3.74 / 485.70 ms │                     479.43 / 482.83 ±3.77 / 489.87 ms │     no change │
│ QQuery 24 │     244.10 / 249.06 ±3.43 / 254.78 ms │                     237.55 / 238.65 ±0.87 / 239.71 ms │     no change │
│ QQuery 25 │     118.94 / 120.50 ±1.23 / 122.45 ms │                     113.81 / 115.68 ±1.51 / 117.94 ms │     no change │
│ QQuery 26 │        71.27 / 72.33 ±0.97 / 74.12 ms │                        70.66 / 71.18 ±0.28 / 71.47 ms │     no change │
│ QQuery 27 │           7.07 / 7.31 ±0.23 / 7.71 ms │                           7.04 / 7.26 ±0.17 / 7.54 ms │     no change │
│ QQuery 28 │        62.72 / 63.11 ±0.30 / 63.55 ms │                        58.27 / 61.16 ±2.08 / 63.20 ms │     no change │
│ QQuery 29 │     101.67 / 103.36 ±1.72 / 106.63 ms │                      99.31 / 100.33 ±0.99 / 101.59 ms │     no change │
│ QQuery 30 │        30.18 / 30.74 ±0.41 / 31.45 ms │                        30.68 / 30.94 ±0.19 / 31.23 ms │     no change │
│ QQuery 31 │     112.12 / 113.31 ±0.75 / 114.20 ms │                     112.65 / 113.01 ±0.27 / 113.45 ms │     no change │
│ QQuery 32 │        20.47 / 20.63 ±0.16 / 20.91 ms │                        20.02 / 20.47 ±0.30 / 20.93 ms │     no change │
│ QQuery 33 │        39.14 / 39.49 ±0.33 / 40.11 ms │                        39.27 / 39.50 ±0.13 / 39.68 ms │     no change │
│ QQuery 34 │         9.78 / 10.04 ±0.22 / 10.36 ms │                         9.84 / 10.18 ±0.20 / 10.40 ms │     no change │
│ QQuery 35 │        80.95 / 82.42 ±1.97 / 85.98 ms │                        81.37 / 82.69 ±1.06 / 84.60 ms │     no change │
│ QQuery 36 │           6.50 / 6.64 ±0.16 / 6.94 ms │                           6.44 / 6.57 ±0.18 / 6.93 ms │     no change │
│ QQuery 37 │           7.11 / 7.35 ±0.24 / 7.81 ms │                           7.27 / 7.35 ±0.09 / 7.50 ms │     no change │
│ QQuery 38 │        68.28 / 68.56 ±0.22 / 68.90 ms │                        68.27 / 69.25 ±0.71 / 70.07 ms │     no change │
│ QQuery 39 │     100.68 / 102.00 ±1.88 / 105.72 ms │                      99.90 / 100.36 ±0.34 / 100.76 ms │     no change │
│ QQuery 40 │        23.77 / 23.92 ±0.13 / 24.11 ms │                        23.26 / 23.50 ±0.22 / 23.85 ms │     no change │
│ QQuery 41 │        14.02 / 14.23 ±0.25 / 14.71 ms │                        14.11 / 14.32 ±0.19 / 14.62 ms │     no change │
│ QQuery 42 │        23.95 / 24.31 ±0.29 / 24.65 ms │                        24.11 / 24.31 ±0.21 / 24.69 ms │     no change │
│ QQuery 43 │           5.26 / 5.38 ±0.12 / 5.61 ms │                           5.43 / 5.51 ±0.10 / 5.69 ms │     no change │
│ QQuery 44 │        10.97 / 11.02 ±0.03 / 11.06 ms │                        11.19 / 11.25 ±0.06 / 11.35 ms │     no change │
│ QQuery 45 │        40.63 / 41.15 ±0.36 / 41.74 ms │                        40.26 / 40.74 ±0.61 / 41.89 ms │     no change │
│ QQuery 46 │        13.47 / 13.78 ±0.29 / 14.27 ms │                        13.43 / 13.56 ±0.13 / 13.80 ms │     no change │
│ QQuery 47 │     232.85 / 237.12 ±4.50 / 244.13 ms │                     232.22 / 235.71 ±2.46 / 239.76 ms │     no change │
│ QQuery 48 │     104.79 / 105.08 ±0.38 / 105.80 ms │                     104.69 / 105.88 ±0.84 / 107.21 ms │     no change │
│ QQuery 49 │        82.96 / 84.63 ±1.39 / 86.31 ms │                        81.24 / 82.36 ±1.82 / 85.98 ms │     no change │
│ QQuery 50 │        63.42 / 64.79 ±2.24 / 69.23 ms │                        61.24 / 62.14 ±1.02 / 64.05 ms │     no change │
│ QQuery 51 │        93.86 / 95.43 ±1.33 / 97.76 ms │                        93.27 / 94.38 ±0.96 / 95.90 ms │     no change │
│ QQuery 52 │        24.12 / 24.97 ±1.02 / 26.69 ms │                        24.46 / 24.67 ±0.17 / 24.92 ms │     no change │
│ QQuery 53 │        30.51 / 30.74 ±0.21 / 31.07 ms │                        30.12 / 30.30 ±0.20 / 30.69 ms │     no change │
│ QQuery 54 │        54.71 / 56.33 ±2.85 / 62.01 ms │                        54.44 / 56.18 ±1.78 / 59.46 ms │     no change │
│ QQuery 55 │        23.99 / 24.32 ±0.21 / 24.62 ms │                        23.76 / 24.16 ±0.29 / 24.50 ms │     no change │
│ QQuery 56 │        39.60 / 40.24 ±0.70 / 41.54 ms │                        39.93 / 40.28 ±0.24 / 40.57 ms │     no change │
│ QQuery 57 │     179.40 / 181.24 ±1.72 / 184.42 ms │                     178.24 / 180.76 ±1.63 / 182.79 ms │     no change │
│ QQuery 58 │     117.81 / 119.53 ±1.61 / 122.53 ms │                     117.91 / 118.56 ±0.59 / 119.60 ms │     no change │
│ QQuery 59 │     119.94 / 120.09 ±0.18 / 120.44 ms │                     118.86 / 119.40 ±0.31 / 119.72 ms │     no change │
│ QQuery 60 │        40.33 / 41.29 ±1.13 / 43.41 ms │                        39.49 / 40.08 ±0.37 / 40.52 ms │     no change │
│ QQuery 61 │        13.74 / 13.88 ±0.18 / 14.19 ms │                        13.64 / 13.80 ±0.20 / 14.18 ms │     no change │
│ QQuery 62 │        47.27 / 48.19 ±1.26 / 50.68 ms │                        46.49 / 47.40 ±1.03 / 49.35 ms │     no change │
│ QQuery 63 │        30.10 / 30.53 ±0.37 / 31.17 ms │                        30.26 / 31.13 ±0.99 / 32.99 ms │     no change │
│ QQuery 64 │     480.89 / 485.65 ±4.40 / 493.20 ms │                     460.52 / 464.43 ±4.61 / 472.98 ms │     no change │
│ QQuery 65 │     145.03 / 149.19 ±5.07 / 159.06 ms │                     146.13 / 149.20 ±2.30 / 152.26 ms │     no change │
│ QQuery 66 │        82.98 / 84.43 ±1.23 / 86.68 ms │                        82.75 / 84.11 ±1.11 / 85.60 ms │     no change │
│ QQuery 67 │     243.13 / 249.40 ±4.17 / 256.02 ms │                     239.16 / 244.20 ±3.18 / 248.35 ms │     no change │
│ QQuery 68 │        13.81 / 14.16 ±0.31 / 14.65 ms │                        13.51 / 13.82 ±0.23 / 14.22 ms │     no change │
│ QQuery 69 │        76.41 / 76.95 ±0.47 / 77.77 ms │                        78.58 / 80.33 ±2.03 / 83.32 ms │     no change │
│ QQuery 70 │     105.88 / 109.68 ±3.19 / 115.20 ms │                     105.17 / 109.14 ±3.98 / 116.78 ms │     no change │
│ QQuery 71 │        36.32 / 36.71 ±0.30 / 37.10 ms │                        35.68 / 37.12 ±2.45 / 42.00 ms │     no change │
│ QQuery 72 │ 2211.42 / 2293.29 ±52.88 / 2350.68 ms │                 2190.63 / 2238.66 ±50.14 / 2335.69 ms │     no change │
│ QQuery 73 │         9.78 / 10.06 ±0.23 / 10.33 ms │                          9.77 / 9.95 ±0.19 / 10.28 ms │     no change │
│ QQuery 74 │     179.17 / 184.50 ±5.38 / 193.63 ms │                     178.53 / 181.79 ±3.46 / 187.83 ms │     no change │
│ QQuery 75 │     150.71 / 152.55 ±2.30 / 157.06 ms │                     148.01 / 148.98 ±0.55 / 149.52 ms │     no change │
│ QQuery 76 │        35.80 / 36.25 ±0.36 / 36.87 ms │                        35.32 / 36.08 ±0.63 / 36.87 ms │     no change │
│ QQuery 77 │        62.36 / 63.15 ±0.42 / 63.49 ms │                        62.23 / 62.59 ±0.19 / 62.75 ms │     no change │
│ QQuery 78 │     196.15 / 199.57 ±4.16 / 207.51 ms │                     189.51 / 191.14 ±1.51 / 193.99 ms │     no change │
│ QQuery 79 │        68.18 / 68.57 ±0.44 / 69.25 ms │                        67.11 / 67.89 ±0.47 / 68.54 ms │     no change │
│ QQuery 80 │     103.71 / 106.00 ±1.94 / 109.48 ms │                     101.30 / 104.17 ±2.17 / 107.57 ms │     no change │
│ QQuery 81 │        24.85 / 25.25 ±0.23 / 25.57 ms │                        24.61 / 24.87 ±0.24 / 25.27 ms │     no change │
│ QQuery 82 │        17.35 / 17.59 ±0.35 / 18.28 ms │                        16.67 / 17.44 ±0.99 / 19.39 ms │     no change │
│ QQuery 83 │        38.45 / 38.79 ±0.29 / 39.23 ms │                        37.24 / 37.69 ±0.36 / 38.28 ms │     no change │
│ QQuery 84 │        43.51 / 43.88 ±0.42 / 44.68 ms │                        43.16 / 43.36 ±0.19 / 43.64 ms │     no change │
│ QQuery 85 │     137.51 / 138.61 ±1.16 / 140.72 ms │                     135.30 / 136.28 ±0.91 / 137.89 ms │     no change │
│ QQuery 86 │        25.64 / 25.84 ±0.17 / 26.04 ms │                        25.40 / 25.79 ±0.35 / 26.31 ms │     no change │
│ QQuery 87 │        69.02 / 69.80 ±0.56 / 70.61 ms │                        68.92 / 70.23 ±1.09 / 72.09 ms │     no change │
│ QQuery 88 │        65.51 / 65.91 ±0.23 / 66.16 ms │                        64.39 / 65.74 ±1.25 / 68.10 ms │     no change │
│ QQuery 89 │        36.29 / 36.81 ±0.40 / 37.32 ms │                        36.21 / 36.56 ±0.26 / 36.99 ms │     no change │
│ QQuery 90 │        17.58 / 17.80 ±0.19 / 18.13 ms │                        17.52 / 17.72 ±0.22 / 18.10 ms │     no change │
│ QQuery 91 │        51.69 / 52.21 ±0.33 / 52.72 ms │                        52.06 / 52.71 ±0.50 / 53.36 ms │     no change │
│ QQuery 92 │        29.54 / 29.87 ±0.19 / 30.07 ms │                        29.82 / 30.07 ±0.18 / 30.30 ms │     no change │
│ QQuery 93 │        53.65 / 54.66 ±1.24 / 57.01 ms │                        51.24 / 52.48 ±1.79 / 55.98 ms │     no change │
│ QQuery 94 │        38.83 / 39.59 ±0.68 / 40.81 ms │                        37.91 / 38.31 ±0.41 / 39.07 ms │     no change │
│ QQuery 95 │        91.22 / 92.41 ±0.76 / 93.20 ms │                        88.39 / 89.31 ±0.88 / 90.83 ms │     no change │
│ QQuery 96 │        24.23 / 24.59 ±0.28 / 24.95 ms │                        24.29 / 24.68 ±0.31 / 25.06 ms │     no change │
│ QQuery 97 │        47.09 / 47.38 ±0.26 / 47.81 ms │                        46.83 / 47.86 ±0.78 / 49.25 ms │     no change │
│ QQuery 98 │        42.54 / 43.37 ±0.55 / 44.14 ms │                        42.93 / 43.27 ±0.20 / 43.53 ms │     no change │
│ QQuery 99 │        71.11 / 71.35 ±0.22 / 71.70 ms │                        70.21 / 70.78 ±0.38 / 71.10 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                                    │ 10782.33ms │
│ Total Time (gene.bordegaray_2026_05_repartition-grouped-hash-take)   │ 10612.82ms │
│ Average Time (HEAD)                                                  │   108.91ms │
│ Average Time (gene.bordegaray_2026_05_repartition-grouped-hash-take) │   107.20ms │
│ Queries Faster                                                       │          1 │
│ Queries Slower                                                       │          0 │
│ Queries with No Change                                               │         98 │
│ Queries with Failure                                                 │          0 │
└──────────────────────────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 55.0s
Peak memory 6.8 GiB
Avg memory 6.1 GiB
CPU user 247.8s
CPU sys 6.6s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 55.0s
Peak memory 6.9 GiB
Avg memory 6.2 GiB
CPU user 243.2s
CPU sys 6.4s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and gene.bordegaray_2026_05_repartition-grouped-hash-take
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃ gene.bordegaray_2026_05_repartition-grouped-hash-take ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │          1.21 / 4.70 ±6.90 / 18.50 ms │                          1.21 / 4.65 ±6.77 / 18.18 ms │     no change │
│ QQuery 1  │        12.64 / 12.97 ±0.23 / 13.33 ms │                        12.63 / 12.97 ±0.18 / 13.13 ms │     no change │
│ QQuery 2  │        35.84 / 36.22 ±0.29 / 36.73 ms │                        35.56 / 36.01 ±0.32 / 36.52 ms │     no change │
│ QQuery 3  │        30.49 / 31.30 ±0.64 / 32.30 ms │                        30.81 / 31.05 ±0.18 / 31.31 ms │     no change │
│ QQuery 4  │     234.92 / 239.26 ±3.28 / 244.72 ms │                     231.04 / 234.71 ±2.77 / 238.77 ms │     no change │
│ QQuery 5  │     277.04 / 278.68 ±1.33 / 280.91 ms │                     278.38 / 280.45 ±1.99 / 283.76 ms │     no change │
│ QQuery 6  │           6.47 / 6.94 ±0.33 / 7.37 ms │                           6.13 / 7.15 ±0.56 / 7.69 ms │     no change │
│ QQuery 7  │        13.80 / 13.90 ±0.09 / 14.04 ms │                        13.86 / 13.99 ±0.08 / 14.07 ms │     no change │
│ QQuery 8  │     311.83 / 319.31 ±4.96 / 327.23 ms │                     312.96 / 319.10 ±5.22 / 328.52 ms │     no change │
│ QQuery 9  │     456.74 / 464.66 ±6.51 / 475.75 ms │                     451.27 / 462.38 ±7.87 / 471.87 ms │     no change │
│ QQuery 10 │        69.44 / 70.90 ±1.72 / 74.28 ms │                        69.70 / 71.41 ±1.17 / 73.07 ms │     no change │
│ QQuery 11 │        79.90 / 80.34 ±0.50 / 81.26 ms │                        80.49 / 82.11 ±1.09 / 83.46 ms │     no change │
│ QQuery 12 │     269.18 / 274.05 ±4.05 / 281.37 ms │                     271.90 / 274.83 ±3.27 / 280.97 ms │     no change │
│ QQuery 13 │    379.93 / 392.63 ±12.61 / 413.40 ms │                     377.19 / 388.35 ±8.37 / 400.54 ms │     no change │
│ QQuery 14 │     278.88 / 282.57 ±4.33 / 290.14 ms │                     278.12 / 279.95 ±1.61 / 282.31 ms │     no change │
│ QQuery 15 │     281.19 / 286.43 ±6.55 / 298.98 ms │                     275.49 / 278.20 ±3.33 / 283.63 ms │     no change │
│ QQuery 16 │     607.66 / 612.92 ±6.24 / 624.01 ms │                     593.93 / 601.51 ±4.35 / 606.79 ms │     no change │
│ QQuery 17 │     607.22 / 611.34 ±5.43 / 622.03 ms │                     597.07 / 604.89 ±8.00 / 618.68 ms │     no change │
│ QQuery 18 │  1199.23 / 1209.96 ±5.72 / 1214.87 ms │                 1161.53 / 1189.57 ±16.71 / 1210.43 ms │     no change │
│ QQuery 19 │        28.02 / 34.07 ±9.09 / 51.96 ms │                        28.05 / 29.81 ±2.62 / 35.01 ms │ +1.14x faster │
│ QQuery 20 │     518.82 / 523.67 ±6.14 / 535.64 ms │                     518.54 / 522.25 ±2.78 / 526.75 ms │     no change │
│ QQuery 21 │     593.52 / 596.87 ±3.92 / 604.35 ms │                     592.11 / 597.65 ±4.08 / 602.64 ms │     no change │
│ QQuery 22 │ 1050.38 / 1064.30 ±14.24 / 1090.99 ms │                  1058.63 / 1061.14 ±3.02 / 1065.10 ms │     no change │
│ QQuery 23 │ 3145.19 / 3190.27 ±33.19 / 3243.33 ms │                 3195.25 / 3231.86 ±24.78 / 3266.06 ms │     no change │
│ QQuery 24 │        42.12 / 43.27 ±1.30 / 45.77 ms │                        41.75 / 42.40 ±0.75 / 43.86 ms │     no change │
│ QQuery 25 │     110.99 / 114.59 ±3.67 / 121.19 ms │                     111.71 / 116.55 ±5.96 / 128.27 ms │     no change │
│ QQuery 26 │        42.30 / 44.32 ±1.83 / 46.53 ms │                        43.39 / 44.06 ±0.55 / 44.87 ms │     no change │
│ QQuery 27 │     668.42 / 679.18 ±9.03 / 695.33 ms │                     676.20 / 681.08 ±5.16 / 690.90 ms │     no change │
│ QQuery 28 │ 2992.39 / 3012.93 ±14.96 / 3027.08 ms │                 2990.16 / 3018.18 ±16.16 / 3037.82 ms │     no change │
│ QQuery 29 │        41.65 / 45.59 ±7.42 / 60.42 ms │                        41.86 / 46.74 ±8.23 / 63.04 ms │     no change │
│ QQuery 30 │     300.52 / 302.26 ±1.40 / 303.93 ms │                     295.20 / 300.43 ±4.33 / 308.33 ms │     no change │
│ QQuery 31 │     282.24 / 292.78 ±8.13 / 305.71 ms │                     279.12 / 288.84 ±5.46 / 294.21 ms │     no change │
│ QQuery 32 │     920.30 / 928.98 ±7.79 / 940.88 ms │                    885.54 / 903.98 ±15.54 / 931.52 ms │     no change │
│ QQuery 33 │ 1422.51 / 1459.42 ±23.74 / 1494.49 ms │                 1424.49 / 1439.16 ±11.86 / 1458.75 ms │     no change │
│ QQuery 34 │ 1449.73 / 1463.83 ±18.67 / 1500.15 ms │                 1446.82 / 1492.24 ±46.65 / 1582.23 ms │     no change │
│ QQuery 35 │    288.26 / 307.05 ±23.25 / 348.73 ms │                     278.26 / 292.68 ±8.38 / 301.54 ms │     no change │
│ QQuery 36 │        62.48 / 64.98 ±2.56 / 69.70 ms │                      64.19 / 74.54 ±14.56 / 101.62 ms │  1.15x slower │
│ QQuery 37 │        35.94 / 40.89 ±4.82 / 47.35 ms │                        35.83 / 39.43 ±3.58 / 45.50 ms │     no change │
│ QQuery 38 │        40.63 / 42.48 ±1.37 / 43.81 ms │                        43.55 / 48.13 ±4.46 / 54.75 ms │  1.13x slower │
│ QQuery 39 │     129.40 / 135.18 ±3.90 / 140.34 ms │                     122.43 / 131.85 ±5.91 / 140.97 ms │     no change │
│ QQuery 40 │        14.28 / 16.81 ±4.22 / 25.21 ms │                        14.06 / 15.25 ±1.52 / 18.18 ms │ +1.10x faster │
│ QQuery 41 │        13.85 / 14.15 ±0.26 / 14.62 ms │                        13.97 / 14.12 ±0.11 / 14.29 ms │     no change │
│ QQuery 42 │        13.48 / 13.99 ±0.64 / 15.25 ms │                        13.48 / 13.70 ±0.14 / 13.88 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                                    │ 19660.95ms │
│ Total Time (gene.bordegaray_2026_05_repartition-grouped-hash-take)   │ 19619.36ms │
│ Average Time (HEAD)                                                  │   457.23ms │
│ Average Time (gene.bordegaray_2026_05_repartition-grouped-hash-take) │   456.26ms │
│ Queries Faster                                                       │          2 │
│ Queries Slower                                                       │          2 │
│ Queries with No Change                                               │         39 │
│ Queries with Failure                                                 │          0 │
└──────────────────────────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 100.0s
Peak memory 31.0 GiB
Avg memory 23.5 GiB
CPU user 1033.3s
CPU sys 64.6s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 100.0s
Peak memory 31.4 GiB
Avg memory 23.1 GiB
CPU user 1029.1s
CPU sys 66.4s
Peak spill 0 B

File an issue against this benchmark runner

@gene-bordegaray
Copy link
Copy Markdown
Contributor Author

gene-bordegaray commented May 13, 2026

This is inended for fanout on larger scale factor. The benchmarks in my description are run with --batch-size=1024 to target workload for this.

Can this be run with

env:
  DATAFUSION_EXECUTION_TARGET_PARTITIONS: 300

@gene-bordegaray
Copy link
Copy Markdown
Contributor Author

cc: @gabotechs

@gene-bordegaray gene-bordegaray changed the title [WIP] Call take arrays once per repartitioned input batch Call take arrays once per repartitioned input batch May 14, 2026
@gene-bordegaray gene-bordegaray marked this pull request as ready for review May 14, 2026 00:38
@gabotechs
Copy link
Copy Markdown
Contributor

run benchmarks

env:
  DATAFUSION_EXECUTION_TARGET_PARTITIONS: 256

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4449879688-84-xqjrt 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing gene.bordegaray/2026/05/repartition-grouped-hash-take (a0a727c) to 937dfda (merge-base) diff using: tpcds
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4449879688-83-j5qdq 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing gene.bordegaray/2026/05/repartition-grouped-hash-take (a0a727c) to 937dfda (merge-base) diff using: clickbench_partitioned
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4449879688-85-w5mkm 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing gene.bordegaray/2026/05/repartition-grouped-hash-take (a0a727c) to 937dfda (merge-base) diff using: tpch
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and gene.bordegaray_2026_05_repartition-grouped-hash-take
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                               HEAD ┃ gene.bordegaray_2026_05_repartition-grouped-hash-take ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │     78.10 / 82.11 ±2.43 / 85.41 ms │                        81.80 / 84.74 ±2.23 / 87.55 ms │     no change │
│ QQuery 2  │     79.20 / 81.34 ±2.25 / 85.24 ms │                        80.77 / 84.54 ±3.32 / 88.97 ms │     no change │
│ QQuery 3  │ 395.02 / 412.58 ±15.47 / 439.25 ms │                    356.57 / 389.97 ±22.76 / 416.64 ms │ +1.06x faster │
│ QQuery 4  │     66.24 / 71.37 ±4.44 / 79.65 ms │                        68.11 / 71.28 ±2.24 / 74.18 ms │     no change │
│ QQuery 5  │ 655.33 / 675.87 ±24.07 / 721.81 ms │                    621.26 / 668.67 ±24.38 / 689.14 ms │     no change │
│ QQuery 6  │     29.60 / 32.25 ±1.54 / 34.02 ms │                        27.19 / 30.23 ±1.90 / 33.06 ms │ +1.07x faster │
│ QQuery 7  │ 664.67 / 697.11 ±26.41 / 727.87 ms │                    629.26 / 661.38 ±26.58 / 704.05 ms │ +1.05x faster │
│ QQuery 8  │ 660.63 / 673.81 ±11.59 / 693.17 ms │                    618.79 / 655.28 ±27.33 / 698.53 ms │     no change │
│ QQuery 9  │  598.89 / 611.99 ±7.11 / 619.82 ms │                    566.80 / 582.92 ±12.31 / 596.35 ms │     no change │
│ QQuery 10 │  115.06 / 121.57 ±3.49 / 125.45 ms │                     110.77 / 118.61 ±4.40 / 123.97 ms │     no change │
│ QQuery 11 │     81.37 / 87.40 ±3.49 / 91.98 ms │                       82.24 / 90.23 ±8.41 / 106.11 ms │     no change │
│ QQuery 12 │ 659.84 / 676.97 ±15.87 / 704.83 ms │                    652.86 / 666.58 ±16.07 / 697.52 ms │     no change │
│ QQuery 13 │ 280.58 / 298.32 ±17.19 / 319.74 ms │                     286.77 / 299.92 ±7.36 / 307.94 ms │     no change │
│ QQuery 14 │     89.69 / 95.01 ±2.92 / 97.59 ms │                        87.87 / 90.41 ±2.08 / 93.11 ms │     no change │
│ QQuery 15 │  134.97 / 138.03 ±1.86 / 140.41 ms │                     135.09 / 140.85 ±3.34 / 145.10 ms │     no change │
│ QQuery 16 │  117.28 / 118.90 ±1.29 / 120.66 ms │                     106.05 / 110.60 ±2.45 / 113.48 ms │ +1.08x faster │
│ QQuery 17 │ 429.60 / 441.63 ±11.85 / 458.39 ms │                     353.54 / 366.38 ±9.23 / 378.90 ms │ +1.21x faster │
│ QQuery 18 │ 784.19 / 814.65 ±26.41 / 856.89 ms │                    551.57 / 591.72 ±22.42 / 620.83 ms │ +1.38x faster │
│ QQuery 19 │     53.47 / 56.98 ±6.13 / 69.23 ms │                        51.80 / 53.01 ±0.90 / 54.23 ms │ +1.07x faster │
│ QQuery 20 │ 370.92 / 410.45 ±48.53 / 503.55 ms │                    363.79 / 396.87 ±23.67 / 426.64 ms │     no change │
│ QQuery 21 │ 452.14 / 482.18 ±18.94 / 508.84 ms │                    438.10 / 472.72 ±18.70 / 490.11 ms │     no change │
│ QQuery 22 │     66.50 / 69.38 ±2.89 / 74.34 ms │                        69.60 / 75.20 ±8.13 / 90.96 ms │  1.08x slower │
└───────────┴────────────────────────────────────┴───────────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                                    ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                                    │ 7149.89ms │
│ Total Time (gene.bordegaray_2026_05_repartition-grouped-hash-take)   │ 6702.10ms │
│ Average Time (HEAD)                                                  │  325.00ms │
│ Average Time (gene.bordegaray_2026_05_repartition-grouped-hash-take) │  304.64ms │
│ Queries Faster                                                       │         7 │
│ Queries Slower                                                       │         1 │
│ Queries with No Change                                               │        14 │
│ Queries with Failure                                                 │         0 │
└──────────────────────────────────────────────────────────────────────┴───────────┘

Resource Usage

tpch — base (merge-base)

Metric Value
Wall time 40.0s
Peak memory 6.7 GiB
Avg memory 5.4 GiB
CPU user 307.2s
CPU sys 7.6s
Peak spill 0 B

tpch — branch

Metric Value
Wall time 35.0s
Peak memory 6.6 GiB
Avg memory 5.5 GiB
CPU user 295.8s
CPU sys 7.6s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and gene.bordegaray_2026_05_repartition-grouped-hash-take
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                   HEAD ┃ gene.bordegaray_2026_05_repartition-grouped-hash-take ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │           1.48 / 4.92 ±6.76 / 18.45 ms │                          1.45 / 4.88 ±6.77 / 18.43 ms │     no change │
│ QQuery 1  │         23.77 / 25.10 ±1.70 / 28.34 ms │                        21.11 / 22.90 ±1.00 / 24.03 ms │ +1.10x faster │
│ QQuery 2  │         46.64 / 48.16 ±1.32 / 50.63 ms │                        47.68 / 49.32 ±1.60 / 52.05 ms │     no change │
│ QQuery 3  │         40.28 / 42.30 ±1.42 / 44.53 ms │                        37.74 / 38.89 ±0.91 / 39.95 ms │ +1.09x faster │
│ QQuery 4  │      334.65 / 339.53 ±2.66 / 341.60 ms │                     296.04 / 301.38 ±4.45 / 307.79 ms │ +1.13x faster │
│ QQuery 5  │      369.66 / 372.70 ±2.96 / 377.07 ms │                     349.28 / 355.05 ±4.42 / 362.89 ms │     no change │
│ QQuery 6  │         19.57 / 21.13 ±1.52 / 23.21 ms │                        19.35 / 20.73 ±1.29 / 23.01 ms │     no change │
│ QQuery 7  │         50.89 / 52.91 ±1.40 / 54.54 ms │                        49.95 / 51.74 ±1.17 / 53.29 ms │     no change │
│ QQuery 8  │      494.57 / 497.15 ±2.52 / 501.78 ms │                     437.09 / 441.81 ±5.34 / 450.57 ms │ +1.13x faster │
│ QQuery 9  │      384.02 / 389.58 ±5.15 / 397.89 ms │                    378.15 / 393.21 ±15.45 / 420.95 ms │     no change │
│ QQuery 10 │      139.61 / 141.10 ±1.19 / 142.70 ms │                     136.22 / 144.35 ±6.11 / 154.08 ms │     no change │
│ QQuery 11 │      154.62 / 160.62 ±3.34 / 164.16 ms │                     147.03 / 151.68 ±3.65 / 157.20 ms │ +1.06x faster │
│ QQuery 12 │      375.72 / 381.30 ±5.92 / 392.68 ms │                     349.90 / 357.67 ±6.25 / 365.62 ms │ +1.07x faster │
│ QQuery 13 │     736.87 / 747.55 ±14.79 / 776.71 ms │                    661.51 / 678.27 ±26.39 / 730.91 ms │ +1.10x faster │
│ QQuery 14 │     378.69 / 392.14 ±16.66 / 424.83 ms │                    346.74 / 363.81 ±20.28 / 402.75 ms │ +1.08x faster │
│ QQuery 15 │      402.94 / 413.33 ±8.00 / 423.90 ms │                    353.45 / 369.33 ±20.77 / 407.91 ms │ +1.12x faster │
│ QQuery 16 │     839.08 / 875.40 ±53.41 / 981.10 ms │                    729.96 / 754.67 ±19.23 / 786.85 ms │ +1.16x faster │
│ QQuery 17 │     832.71 / 855.44 ±18.67 / 885.12 ms │                    721.63 / 751.78 ±21.97 / 777.54 ms │ +1.14x faster │
│ QQuery 18 │  1757.40 / 1790.27 ±22.72 / 1814.01 ms │                 1465.20 / 1491.03 ±15.24 / 1505.23 ms │ +1.20x faster │
│ QQuery 19 │       41.01 / 61.28 ±24.72 / 105.84 ms │                        40.25 / 42.36 ±1.10 / 43.34 ms │ +1.45x faster │
│ QQuery 20 │     520.38 / 546.27 ±24.99 / 592.57 ms │                    499.91 / 523.22 ±21.03 / 553.02 ms │     no change │
│ QQuery 21 │     630.71 / 643.30 ±14.10 / 667.14 ms │                    638.83 / 656.45 ±16.89 / 682.87 ms │     no change │
│ QQuery 22 │  1066.71 / 1085.65 ±12.31 / 1100.35 ms │                 1079.35 / 1138.00 ±48.43 / 1220.92 ms │     no change │
│ QQuery 23 │ 1618.67 / 1773.32 ±129.77 / 2012.75 ms │                 1643.65 / 1812.02 ±91.67 / 1892.34 ms │     no change │
│ QQuery 24 │         53.53 / 58.02 ±3.23 / 62.94 ms │                      54.70 / 68.41 ±19.19 / 106.25 ms │  1.18x slower │
│ QQuery 25 │     120.63 / 133.38 ±11.31 / 147.25 ms │                     123.81 / 125.43 ±1.58 / 127.43 ms │ +1.06x faster │
│ QQuery 26 │         54.69 / 57.58 ±2.27 / 60.96 ms │                        51.91 / 57.32 ±3.30 / 62.04 ms │     no change │
│ QQuery 27 │     681.82 / 723.76 ±33.14 / 756.67 ms │                    679.67 / 706.76 ±17.79 / 727.30 ms │     no change │
│ QQuery 28 │  2995.43 / 3030.95 ±30.92 / 3084.72 ms │                 2905.99 / 2977.53 ±42.49 / 3016.17 ms │     no change │
│ QQuery 29 │         54.99 / 56.48 ±1.00 / 58.07 ms │                      53.84 / 68.67 ±17.81 / 102.91 ms │  1.22x slower │
│ QQuery 30 │     403.48 / 419.57 ±11.54 / 433.29 ms │                     363.38 / 368.73 ±3.83 / 373.16 ms │ +1.14x faster │
│ QQuery 31 │      535.94 / 537.53 ±1.34 / 539.56 ms │                     444.03 / 451.60 ±9.17 / 468.68 ms │ +1.19x faster │
│ QQuery 32 │ 2278.34 / 2360.17 ±115.78 / 2590.15 ms │                 1683.16 / 1732.16 ±28.63 / 1772.96 ms │ +1.36x faster │
│ QQuery 33 │ 1690.35 / 1794.38 ±137.32 / 2064.70 ms │                 1596.67 / 1625.62 ±20.40 / 1650.46 ms │ +1.10x faster │
│ QQuery 34 │  1677.98 / 1723.76 ±29.30 / 1757.23 ms │                 1617.73 / 1699.21 ±92.62 / 1871.41 ms │     no change │
│ QQuery 35 │    373.88 / 430.04 ±104.70 / 639.39 ms │                    325.77 / 359.21 ±32.13 / 415.18 ms │ +1.20x faster │
│ QQuery 36 │     103.48 / 114.70 ±12.14 / 137.37 ms │                      98.20 / 103.23 ±3.64 / 109.37 ms │ +1.11x faster │
│ QQuery 37 │         74.43 / 79.97 ±5.95 / 90.19 ms │                        73.54 / 77.28 ±2.59 / 80.21 ms │     no change │
│ QQuery 38 │         78.28 / 84.25 ±4.32 / 89.62 ms │                        80.18 / 83.11 ±1.71 / 85.20 ms │     no change │
│ QQuery 39 │      185.99 / 189.60 ±2.77 / 193.02 ms │                     175.12 / 177.98 ±2.34 / 181.98 ms │ +1.07x faster │
│ QQuery 40 │         51.71 / 55.90 ±3.24 / 60.54 ms │                        54.37 / 57.77 ±2.32 / 60.44 ms │     no change │
│ QQuery 41 │         50.94 / 53.24 ±2.24 / 57.23 ms │                        53.56 / 58.40 ±5.61 / 69.01 ms │  1.10x slower │
│ QQuery 42 │         53.34 / 58.82 ±7.67 / 73.87 ms │                        49.75 / 53.85 ±4.09 / 61.11 ms │ +1.09x faster │
└───────────┴────────────────────────────────────────┴───────────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                                    │ 23622.54ms │
│ Total Time (gene.bordegaray_2026_05_repartition-grouped-hash-take)   │ 21766.85ms │
│ Average Time (HEAD)                                                  │   549.36ms │
│ Average Time (gene.bordegaray_2026_05_repartition-grouped-hash-take) │   506.21ms │
│ Queries Faster                                                       │         22 │
│ Queries Slower                                                       │          3 │
│ Queries with No Change                                               │         18 │
│ Queries with Failure                                                 │          0 │
└──────────────────────────────────────────────────────────────────────┴────────────┘

Resource Usage

clickbench_partitioned — base (merge-base)

Metric Value
Wall time 120.0s
Peak memory 35.4 GiB
Avg memory 28.0 GiB
CPU user 1220.9s
CPU sys 79.7s
Peak spill 0 B

clickbench_partitioned — branch

Metric Value
Wall time 110.0s
Peak memory 35.2 GiB
Avg memory 28.2 GiB
CPU user 1113.8s
CPU sys 84.2s
Peak spill 0 B

File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and gene.bordegaray_2026_05_repartition-grouped-hash-take
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                      HEAD ┃ gene.bordegaray_2026_05_repartition-grouped-hash-take ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │            55.55 / 56.77 ±1.26 / 59.19 ms │                        56.61 / 57.12 ±0.88 / 58.89 ms │     no change │
│ QQuery 2  │         284.14 / 292.01 ±7.51 / 300.96 ms │                     286.97 / 299.82 ±9.89 / 316.98 ms │     no change │
│ QQuery 3  │            70.30 / 74.36 ±3.94 / 81.70 ms │                        70.17 / 75.45 ±2.92 / 78.30 ms │     no change │
│ QQuery 4  │         763.42 / 776.34 ±8.08 / 786.68 ms │                     739.28 / 758.97 ±9.96 / 766.39 ms │     no change │
│ QQuery 5  │        584.13 / 615.31 ±24.58 / 659.48 ms │                    596.65 / 632.10 ±21.56 / 662.42 ms │     no change │
│ QQuery 6  │         147.92 / 150.65 ±2.48 / 153.83 ms │                     148.19 / 153.48 ±6.50 / 164.50 ms │     no change │
│ QQuery 7  │        815.56 / 850.28 ±19.39 / 868.08 ms │                    784.39 / 792.38 ±13.01 / 818.19 ms │ +1.07x faster │
│ QQuery 8  │         145.18 / 148.74 ±2.31 / 152.30 ms │                     148.71 / 151.47 ±2.09 / 154.11 ms │     no change │
│ QQuery 9  │        218.31 / 238.16 ±10.29 / 248.18 ms │                    222.87 / 239.76 ±18.91 / 274.74 ms │     no change │
│ QQuery 10 │         154.57 / 156.73 ±2.37 / 161.33 ms │                     155.57 / 156.68 ±1.10 / 158.65 ms │     no change │
│ QQuery 11 │         509.03 / 517.43 ±9.62 / 533.82 ms │                     502.62 / 512.40 ±8.53 / 526.69 ms │     no change │
│ QQuery 12 │         102.54 / 105.21 ±1.94 / 108.08 ms │                     103.32 / 106.78 ±5.24 / 117.12 ms │     no change │
│ QQuery 13 │        407.47 / 432.81 ±14.72 / 447.64 ms │                    392.91 / 421.28 ±20.90 / 445.74 ms │     no change │
│ QQuery 14 │     7416.79 / 7530.76 ±83.02 / 7671.10 ms │                 7395.84 / 7461.39 ±53.90 / 7550.32 ms │     no change │
│ QQuery 15 │        101.83 / 112.84 ±18.44 / 149.65 ms │                    101.86 / 114.54 ±21.98 / 158.44 ms │     no change │
│ QQuery 16 │            58.47 / 60.05 ±1.70 / 62.73 ms │                        58.59 / 62.60 ±5.11 / 72.58 ms │     no change │
│ QQuery 17 │        926.30 / 965.51 ±21.29 / 991.25 ms │                   850.83 / 908.97 ±77.60 / 1061.81 ms │ +1.06x faster │
│ QQuery 18 │        670.49 / 688.76 ±18.59 / 724.09 ms │                    591.07 / 627.32 ±22.43 / 653.44 ms │ +1.10x faster │
│ QQuery 19 │            80.37 / 89.04 ±4.78 / 93.57 ms │                        81.51 / 88.95 ±4.05 / 92.90 ms │     no change │
│ QQuery 20 │         106.42 / 107.70 ±0.78 / 108.82 ms │                    107.94 / 119.32 ±11.93 / 134.39 ms │  1.11x slower │
│ QQuery 21 │            69.04 / 73.05 ±4.93 / 82.24 ms │                        63.76 / 69.46 ±3.30 / 72.63 ms │     no change │
│ QQuery 22 │         118.80 / 120.92 ±1.52 / 123.33 ms │                     115.51 / 121.92 ±7.41 / 136.25 ms │     no change │
│ QQuery 23 │     1035.99 / 1075.80 ±24.72 / 1112.82 ms │                   965.50 / 990.86 ±21.09 / 1014.79 ms │ +1.09x faster │
│ QQuery 24 │     2150.76 / 2186.80 ±19.33 / 2206.81 ms │                 1906.89 / 2050.51 ±72.52 / 2098.59 ms │ +1.07x faster │
│ QQuery 25 │     1031.24 / 1055.08 ±15.10 / 1078.36 ms │                   892.39 / 963.05 ±52.88 / 1028.67 ms │ +1.10x faster │
│ QQuery 26 │        732.59 / 753.64 ±19.61 / 785.83 ms │                    717.89 / 736.09 ±17.90 / 769.67 ms │     no change │
│ QQuery 27 │            71.75 / 77.51 ±7.62 / 92.33 ms │                        73.11 / 75.84 ±3.09 / 81.48 ms │     no change │
│ QQuery 28 │         116.26 / 122.12 ±4.23 / 128.23 ms │                     119.02 / 125.86 ±3.92 / 130.76 ms │     no change │
│ QQuery 29 │     1004.48 / 1021.08 ±11.52 / 1036.03 ms │                    906.37 / 947.49 ±25.65 / 971.65 ms │ +1.08x faster │
│ QQuery 30 │         145.46 / 148.08 ±2.01 / 151.51 ms │                    145.51 / 154.41 ±13.02 / 180.06 ms │     no change │
│ QQuery 31 │         442.51 / 460.32 ±9.50 / 470.56 ms │                    439.06 / 457.29 ±14.53 / 474.63 ms │     no change │
│ QQuery 32 │            73.83 / 79.99 ±6.95 / 93.40 ms │                        71.74 / 76.33 ±3.72 / 81.75 ms │     no change │
│ QQuery 33 │         147.36 / 151.52 ±3.89 / 158.14 ms │                    146.58 / 156.45 ±15.68 / 187.68 ms │     no change │
│ QQuery 34 │            51.98 / 55.07 ±2.56 / 57.90 ms │                       51.98 / 61.71 ±13.73 / 88.94 ms │  1.12x slower │
│ QQuery 35 │        151.48 / 162.49 ±10.89 / 181.46 ms │                     154.84 / 158.76 ±3.41 / 164.70 ms │     no change │
│ QQuery 36 │            81.86 / 85.45 ±2.70 / 88.37 ms │                       83.80 / 90.44 ±6.54 / 101.56 ms │  1.06x slower │
│ QQuery 37 │            43.84 / 45.30 ±1.14 / 46.71 ms │                        45.35 / 45.79 ±0.26 / 46.09 ms │     no change │
│ QQuery 38 │         192.58 / 199.06 ±4.41 / 205.41 ms │                     192.29 / 197.78 ±8.88 / 215.49 ms │     no change │
│ QQuery 39 │        452.14 / 463.44 ±16.60 / 496.21 ms │                     430.01 / 439.50 ±7.87 / 449.61 ms │ +1.05x faster │
│ QQuery 40 │         129.87 / 131.95 ±2.03 / 135.74 ms │                     117.82 / 122.17 ±3.71 / 126.67 ms │ +1.08x faster │
│ QQuery 41 │            86.62 / 90.14 ±2.50 / 94.22 ms │                        84.65 / 86.71 ±1.39 / 88.55 ms │     no change │
│ QQuery 42 │            65.97 / 68.40 ±1.36 / 69.92 ms │                        62.85 / 68.47 ±3.15 / 72.19 ms │     no change │
│ QQuery 43 │            45.65 / 46.78 ±0.69 / 47.53 ms │                        46.22 / 50.21 ±5.64 / 61.33 ms │  1.07x slower │
│ QQuery 44 │         136.11 / 142.76 ±4.33 / 149.37 ms │                     135.36 / 140.79 ±3.49 / 144.58 ms │     no change │
│ QQuery 45 │            85.93 / 88.59 ±2.78 / 93.62 ms │                       85.35 / 90.32 ±6.23 / 102.18 ms │     no change │
│ QQuery 46 │            64.13 / 66.46 ±2.48 / 70.89 ms │                        61.19 / 62.86 ±1.14 / 64.50 ms │ +1.06x faster │
│ QQuery 47 │        530.40 / 549.73 ±12.40 / 563.42 ms │                    491.97 / 508.84 ±11.75 / 524.93 ms │ +1.08x faster │
│ QQuery 48 │        382.30 / 404.36 ±23.87 / 448.30 ms │                    372.11 / 402.79 ±15.95 / 414.98 ms │     no change │
│ QQuery 49 │        388.65 / 410.25 ±19.50 / 442.04 ms │                    375.44 / 390.65 ±12.53 / 413.01 ms │     no change │
│ QQuery 50 │        536.55 / 591.77 ±61.36 / 708.97 ms │                    519.83 / 558.88 ±51.65 / 660.88 ms │ +1.06x faster │
│ QQuery 51 │         320.06 / 324.21 ±4.69 / 332.40 ms │                     298.75 / 311.82 ±7.84 / 323.42 ms │     no change │
│ QQuery 52 │            65.84 / 69.75 ±3.38 / 75.17 ms │                        64.91 / 68.37 ±3.00 / 72.38 ms │     no change │
│ QQuery 53 │         107.04 / 109.97 ±2.77 / 114.01 ms │                     106.77 / 112.12 ±8.68 / 129.22 ms │     no change │
│ QQuery 54 │         255.59 / 264.42 ±5.91 / 270.11 ms │                     246.33 / 259.83 ±7.72 / 267.91 ms │     no change │
│ QQuery 55 │            61.48 / 69.80 ±4.90 / 76.52 ms │                        62.36 / 66.65 ±2.47 / 69.51 ms │     no change │
│ QQuery 56 │         141.39 / 145.35 ±3.77 / 151.84 ms │                     132.93 / 142.31 ±8.48 / 157.13 ms │     no change │
│ QQuery 57 │         434.70 / 437.36 ±2.77 / 442.31 ms │                     418.40 / 426.11 ±6.20 / 436.05 ms │     no change │
│ QQuery 58 │         258.64 / 266.81 ±9.24 / 284.08 ms │                    258.14 / 271.32 ±11.81 / 292.98 ms │     no change │
│ QQuery 59 │         208.22 / 216.16 ±5.46 / 223.55 ms │                     211.97 / 217.31 ±4.80 / 225.89 ms │     no change │
│ QQuery 60 │         140.26 / 148.23 ±5.55 / 156.50 ms │                     132.86 / 143.03 ±7.30 / 154.74 ms │     no change │
│ QQuery 61 │            33.10 / 33.69 ±0.68 / 35.03 ms │                       32.93 / 40.48 ±14.39 / 69.25 ms │  1.20x slower │
│ QQuery 62 │          99.72 / 101.03 ±1.95 / 104.90 ms │                       97.70 / 99.38 ±0.92 / 100.19 ms │     no change │
│ QQuery 63 │         107.61 / 115.09 ±7.94 / 129.54 ms │                     107.38 / 113.64 ±7.47 / 128.34 ms │     no change │
│ QQuery 64 │     2989.03 / 3020.36 ±30.36 / 3076.37 ms │                 2646.75 / 2714.70 ±39.12 / 2763.71 ms │ +1.11x faster │
│ QQuery 65 │     1148.86 / 1158.75 ±10.00 / 1177.22 ms │                  1148.34 / 1158.73 ±9.24 / 1175.50 ms │     no change │
│ QQuery 66 │         192.02 / 193.64 ±1.95 / 197.34 ms │                     192.00 / 197.73 ±4.40 / 204.07 ms │     no change │
│ QQuery 67 │         368.40 / 380.45 ±7.58 / 389.00 ms │                     337.10 / 348.89 ±8.71 / 357.69 ms │ +1.09x faster │
│ QQuery 68 │            63.51 / 73.44 ±7.85 / 82.53 ms │                        62.83 / 71.90 ±9.67 / 89.67 ms │     no change │
│ QQuery 69 │         148.70 / 151.42 ±3.34 / 157.75 ms │                     149.86 / 152.58 ±3.23 / 158.91 ms │     no change │
│ QQuery 70 │        255.84 / 277.94 ±13.16 / 293.68 ms │                    265.83 / 285.11 ±16.66 / 313.02 ms │     no change │
│ QQuery 71 │          92.70 / 100.41 ±4.11 / 104.06 ms │                      99.69 / 104.71 ±4.94 / 112.56 ms │     no change │
│ QQuery 72 │ 19275.01 / 19669.77 ±288.99 / 20076.32 ms │             18615.77 / 19270.95 ±458.33 / 20041.56 ms │     no change │
│ QQuery 73 │            55.60 / 62.21 ±7.64 / 76.41 ms │                      55.62 / 74.94 ±35.67 / 146.27 ms │  1.20x slower │
│ QQuery 74 │        356.47 / 375.60 ±25.05 / 422.31 ms │                    340.97 / 363.96 ±13.49 / 382.23 ms │     no change │
│ QQuery 75 │     1365.68 / 1465.21 ±72.43 / 1587.84 ms │                 1408.58 / 1438.15 ±27.12 / 1476.43 ms │     no change │
│ QQuery 76 │         160.79 / 171.34 ±7.21 / 182.22 ms │                     158.42 / 170.12 ±9.31 / 185.28 ms │     no change │
│ QQuery 77 │         282.62 / 291.62 ±9.61 / 307.02 ms │                     297.49 / 305.07 ±9.48 / 322.78 ms │     no change │
│ QQuery 78 │         697.43 / 711.98 ±9.24 / 722.20 ms │                     582.89 / 595.52 ±7.00 / 601.71 ms │ +1.20x faster │
│ QQuery 79 │         121.44 / 127.11 ±5.05 / 135.53 ms │                     121.76 / 126.10 ±3.26 / 131.36 ms │     no change │
│ QQuery 80 │        543.44 / 564.80 ±17.52 / 588.21 ms │                    501.85 / 517.56 ±11.05 / 531.26 ms │ +1.09x faster │
│ QQuery 81 │         139.94 / 148.08 ±5.85 / 157.48 ms │                    138.68 / 153.13 ±11.29 / 168.98 ms │     no change │
│ QQuery 82 │            66.91 / 72.22 ±7.33 / 86.47 ms │                        67.64 / 73.40 ±6.47 / 85.23 ms │     no change │
│ QQuery 83 │         163.56 / 168.97 ±4.27 / 174.28 ms │                     167.43 / 170.19 ±2.27 / 173.29 ms │     no change │
│ QQuery 84 │            68.83 / 73.12 ±5.16 / 82.90 ms │                        73.55 / 76.14 ±3.13 / 81.81 ms │     no change │
│ QQuery 85 │        715.07 / 765.28 ±27.61 / 795.64 ms │                    633.99 / 665.70 ±18.43 / 683.60 ms │ +1.15x faster │
│ QQuery 86 │         105.45 / 110.88 ±4.40 / 118.21 ms │                     103.77 / 107.83 ±6.20 / 120.12 ms │     no change │
│ QQuery 87 │         192.15 / 201.25 ±6.46 / 212.40 ms │                     193.57 / 199.04 ±3.50 / 203.72 ms │     no change │
│ QQuery 88 │         163.53 / 168.07 ±4.35 / 173.97 ms │                     169.49 / 175.50 ±7.57 / 189.32 ms │     no change │
│ QQuery 89 │         118.05 / 125.00 ±5.87 / 132.38 ms │                     117.80 / 122.87 ±4.74 / 129.97 ms │     no change │
│ QQuery 90 │            48.38 / 48.85 ±0.49 / 49.61 ms │                        50.53 / 53.37 ±3.23 / 59.69 ms │  1.09x slower │
│ QQuery 91 │           98.17 / 99.05 ±0.98 / 100.86 ms │                       96.21 / 99.06 ±1.53 / 100.64 ms │     no change │
│ QQuery 92 │            83.15 / 88.68 ±3.90 / 93.36 ms │                        85.49 / 88.04 ±2.26 / 91.67 ms │     no change │
│ QQuery 93 │         263.09 / 268.79 ±4.39 / 275.31 ms │                     221.04 / 225.74 ±2.56 / 228.65 ms │ +1.19x faster │
│ QQuery 94 │         109.46 / 112.33 ±2.85 / 116.81 ms │                     109.62 / 113.23 ±2.24 / 116.00 ms │     no change │
│ QQuery 95 │        825.91 / 842.25 ±10.11 / 851.03 ms │                    774.22 / 811.53 ±34.11 / 856.63 ms │     no change │
│ QQuery 96 │            42.25 / 44.17 ±1.86 / 47.15 ms │                        42.82 / 46.80 ±6.45 / 59.64 ms │  1.06x slower │
│ QQuery 97 │         157.29 / 161.12 ±4.82 / 170.55 ms │                     148.93 / 154.65 ±4.84 / 161.87 ms │     no change │
│ QQuery 98 │         117.33 / 123.90 ±4.86 / 131.69 ms │                     117.53 / 124.60 ±6.94 / 136.07 ms │     no change │
│ QQuery 99 │         124.91 / 127.25 ±2.61 / 132.07 ms │                     122.78 / 129.54 ±4.96 / 137.58 ms │     no change │
└───────────┴───────────────────────────────────────────┴───────────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                                    │ 59744.55ms │
│ Total Time (gene.bordegaray_2026_05_repartition-grouped-hash-take)   │ 57902.35ms │
│ Average Time (HEAD)                                                  │   603.48ms │
│ Average Time (gene.bordegaray_2026_05_repartition-grouped-hash-take) │   584.87ms │
│ Queries Faster                                                       │         18 │
│ Queries Slower                                                       │          8 │
│ Queries with No Change                                               │         73 │
│ Queries with Failure                                                 │          0 │
└──────────────────────────────────────────────────────────────────────┴────────────┘

Resource Usage

tpcds — base (merge-base)

Metric Value
Wall time 300.1s
Peak memory 9.7 GiB
Avg memory 6.8 GiB
CPU user 2251.0s
CPU sys 52.9s
Peak spill 0 B

tpcds — branch

Metric Value
Wall time 290.1s
Peak memory 10.8 GiB
Avg memory 6.8 GiB
CPU user 2177.0s
CPU sys 54.9s
Peak spill 0 B

File an issue against this benchmark runner

@Dandandan
Copy link
Copy Markdown
Contributor

This is very smart ❤️

@gabotechs
Copy link
Copy Markdown
Contributor

Yeap, this is pretty smart indeed, nice work @gene-bordegaray!

@gene-bordegaray
Copy link
Copy Markdown
Contributor Author

if you guys are interested, this was stacked with the "internal" metrics level to repartition: #21152

along with a microbenchmark for repartition cases that will follow up the metrics PR 👍

Copy link
Copy Markdown
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Comment thread datafusion/physical-plan/src/repartition/mod.rs Outdated
Comment thread datafusion/physical-plan/src/repartition/mod.rs
@github-actions github-actions Bot added the physical-plan Changes to the physical-plan crate label May 14, 2026
@Dandandan
Copy link
Copy Markdown
Contributor

Dandandan commented May 14, 2026

run benchmark tpch10

env:
  DATAFUSION_EXECUTION_TARGET_PARTITIONS: 256

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark running (GKE) | trigger
Instance: c4a-highmem-16 (12 vCPU / 65 GiB) | Linux bench-c4450474933-86-g97rb 6.12.68+ #1 SMP Wed Apr 1 02:23:28 UTC 2026 aarch64 GNU/Linux

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected

Comparing gene.bordegaray/2026/05/repartition-grouped-hash-take (96b9172) to 937dfda (merge-base) diff using: tpch10
Results will be posted here when complete


File an issue against this benchmark runner

@adriangbot
Copy link
Copy Markdown

🤖 Benchmark completed (GKE) | trigger

Instance: c4a-highmem-16 (12 vCPU / 65 GiB)

CPU Details (lscpu)
Architecture:                            aarch64
CPU op-mode(s):                          64-bit
Byte Order:                              Little Endian
CPU(s):                                  16
On-line CPU(s) list:                     0-15
Vendor ID:                               ARM
Model name:                              Neoverse-V2
Model:                                   1
Thread(s) per core:                      1
Core(s) per cluster:                     16
Socket(s):                               -
Cluster(s):                              1
Stepping:                                r0p1
BogoMIPS:                                2000.00
Flags:                                   fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 sm3 sm4 asimddp sha512 sve asimdfhm dit uscat ilrcpc flagm sb paca pacg dcpodp sve2 sveaes svepmull svebitperm svesha3 svesm4 flagm2 frint svei8mm svebf16 i8mm bf16 dgh rng bti
L1d cache:                               1 MiB (16 instances)
L1i cache:                               1 MiB (16 instances)
L2 cache:                                32 MiB (16 instances)
L3 cache:                                80 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-15
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; CSV2, BHB
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Not affected
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Not affected
Details

Comparing HEAD and gene.bordegaray_2026_05_repartition-grouped-hash-take
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃                                  HEAD ┃ gene.bordegaray_2026_05_repartition-grouped-hash-take ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │     351.72 / 357.96 ±6.84 / 368.71 ms │                     353.82 / 355.93 ±2.71 / 361.25 ms │     no change │
│ QQuery 2  │  998.34 / 1026.65 ±35.24 / 1093.23 ms │                    939.27 / 962.83 ±23.44 / 991.46 ms │ +1.07x faster │
│ QQuery 3  │ 1436.29 / 1465.85 ±27.84 / 1508.40 ms │                 1183.26 / 1261.61 ±47.14 / 1331.44 ms │ +1.16x faster │
│ QQuery 4  │    685.14 / 726.26 ±34.19 / 786.45 ms │                    606.49 / 626.38 ±17.25 / 655.71 ms │ +1.16x faster │
│ QQuery 5  │ 1791.47 / 1828.81 ±21.90 / 1859.77 ms │                 1506.08 / 1550.43 ±36.69 / 1597.10 ms │ +1.18x faster │
│ QQuery 6  │     142.57 / 146.89 ±3.97 / 153.97 ms │                     134.70 / 138.07 ±2.21 / 140.78 ms │ +1.06x faster │
│ QQuery 7  │ 2002.90 / 2034.49 ±21.29 / 2059.72 ms │                 1651.25 / 1710.75 ±48.59 / 1757.92 ms │ +1.19x faster │
│ QQuery 8  │ 2295.28 / 2365.09 ±77.48 / 2513.73 ms │                 2011.55 / 2031.28 ±15.88 / 2057.14 ms │ +1.16x faster │
│ QQuery 9  │ 2454.73 / 2501.40 ±30.73 / 2538.20 ms │                 1969.83 / 2002.69 ±29.87 / 2052.03 ms │ +1.25x faster │
│ QQuery 10 │ 1147.47 / 1182.51 ±36.72 / 1251.01 ms │                 1071.11 / 1123.17 ±33.57 / 1169.92 ms │ +1.05x faster │
│ QQuery 11 │    701.64 / 743.29 ±35.79 / 802.63 ms │                    668.71 / 689.44 ±11.26 / 699.04 ms │ +1.08x faster │
│ QQuery 12 │    589.16 / 610.33 ±26.25 / 656.86 ms │                    579.18 / 593.08 ±18.20 / 628.97 ms │     no change │
│ QQuery 13 │    610.64 / 630.29 ±14.33 / 650.92 ms │                     595.76 / 603.65 ±5.38 / 611.65 ms │     no change │
│ QQuery 14 │    455.16 / 473.19 ±20.62 / 511.36 ms │                    446.54 / 482.04 ±22.93 / 502.24 ms │     no change │
│ QQuery 15 │     416.84 / 428.62 ±9.29 / 440.67 ms │                     403.54 / 418.28 ±9.95 / 430.73 ms │     no change │
│ QQuery 16 │    481.85 / 512.16 ±20.80 / 532.05 ms │                    456.27 / 483.80 ±24.20 / 515.32 ms │ +1.06x faster │
│ QQuery 17 │ 2986.56 / 3029.71 ±43.24 / 3103.29 ms │                 2565.62 / 2608.58 ±57.62 / 2720.85 ms │ +1.16x faster │
│ QQuery 18 │ 2490.45 / 2565.73 ±62.26 / 2670.34 ms │                 1977.25 / 2012.98 ±36.78 / 2081.13 ms │ +1.27x faster │
│ QQuery 19 │    666.33 / 690.44 ±16.09 / 714.04 ms │                    646.61 / 668.39 ±14.60 / 691.56 ms │     no change │
│ QQuery 20 │ 1498.97 / 1532.48 ±31.27 / 1577.61 ms │                 1415.91 / 1461.18 ±33.79 / 1503.82 ms │     no change │
│ QQuery 21 │ 3163.93 / 3180.51 ±17.29 / 3211.14 ms │                 2449.73 / 2526.09 ±59.40 / 2604.12 ms │ +1.26x faster │
│ QQuery 22 │    445.92 / 458.13 ±12.16 / 476.40 ms │                     435.30 / 447.41 ±6.68 / 454.08 ms │     no change │
└───────────┴───────────────────────────────────────┴───────────────────────────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                                    ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                                    │ 28490.79ms │
│ Total Time (gene.bordegaray_2026_05_repartition-grouped-hash-take)   │ 24758.05ms │
│ Average Time (HEAD)                                                  │  1295.04ms │
│ Average Time (gene.bordegaray_2026_05_repartition-grouped-hash-take) │  1125.37ms │
│ Queries Faster                                                       │         14 │
│ Queries Slower                                                       │          0 │
│ Queries with No Change                                               │          8 │
│ Queries with Failure                                                 │          0 │
└──────────────────────────────────────────────────────────────────────┴────────────┘

Resource Usage

tpch10 — base (merge-base)

Metric Value
Wall time 145.0s
Peak memory 13.2 GiB
Avg memory 11.0 GiB
CPU user 1500.1s
CPU sys 38.8s
Peak spill 0 B

tpch10 — branch

Metric Value
Wall time 125.0s
Peak memory 15.9 GiB
Avg memory 11.4 GiB
CPU user 1278.7s
CPU sys 44.0s
Peak spill 0 B

File an issue against this benchmark runner

@gene-bordegaray
Copy link
Copy Markdown
Contributor Author

gene-bordegaray commented May 14, 2026

I think that dyn filter CI test that is fialing is flaky

I can address or this could be separate PR

@gabotechs
Copy link
Copy Markdown
Contributor

Just re-run the job and succeeded. That test seems to be flaky.

For me this is good to go! I'll give some time for @Dandandan (or anyone else) to give some feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants