Vector similarity search scan benchmarks#7499
Conversation
7123283 to
95480de
Compare
Merging this PR will not alter performance
Comparing Footnotes
|
95480de to
064f059
Compare
## Summary We want to merge #7499 soon, but it is not a fair comparison because we do not normalize the uncompressed vectors flavor, but we do normalize the TurboQuant-encoded vectors. This just makes the comparison fair. I've marked this as semi-unstable (there is no `with_turboquant` equivalent on the compressor builder, but the ID is clearly unstable) just so we can get the benchmark in #7499 over the line in a reasonable state. ## Testing It already works for the TurboQuant scheme, this is just a reduction of that so we don't need to test this. Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
b340119 to
a4c6350
Compare
a4c6350 to
7ce60d2
Compare
ca526bd to
40c97f2
Compare
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
40c97f2 to
aa3f3d5
Compare
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.076x ➖ datafusion / vortex-file-compressed (1.076x ➖, 0↑ 3↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.000x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.995x ➖, 0↑ 0↓)
datafusion / parquet (1.015x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.018x ➖, 0↑ 2↓)
duckdb / vortex-compact (1.038x ➖, 0↑ 1↓)
duckdb / parquet (1.038x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.070x ➖, 0↑ 1↓)
datafusion / vortex-compact (1.063x ➖, 0↑ 2↓)
datafusion / parquet (1.049x ➖, 0↑ 5↓)
datafusion / arrow (1.078x ➖, 0↑ 4↓)
duckdb / vortex-file-compressed (1.049x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.031x ➖, 0↑ 0↓)
duckdb / parquet (1.027x ➖, 0↑ 1↓)
duckdb / duckdb (1.047x ➖, 0↑ 1↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMENo file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.002x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.994x ➖, 0↑ 0↓)
datafusion / parquet (1.031x ➖, 0↑ 11↓)
duckdb / vortex-file-compressed (1.056x ➖, 0↑ 26↓)
duckdb / vortex-compact (1.023x ➖, 0↑ 6↓)
duckdb / parquet (1.012x ➖, 0↑ 2↓)
duckdb / duckdb (0.998x ➖, 4↑ 5↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.705x ➖, 4↑ 1↓)
datafusion / vortex-compact (1.093x ➖, 0↑ 1↓)
datafusion / parquet (1.049x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.974x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.024x ➖, 0↑ 0↓)
duckdb / parquet (1.004x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (1.020x ➖, 0↑ 1↓)
duckdb / vortex-compact (0.991x ➖, 0↑ 0↓)
duckdb / parquet (0.984x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.932x ➖, 3↑ 0↓)
datafusion / vortex-compact (0.929x ➖, 1↑ 0↓)
datafusion / parquet (0.960x ➖, 0↑ 0↓)
datafusion / arrow (0.941x ➖, 3↑ 0↓)
duckdb / vortex-file-compressed (0.931x ➖, 4↑ 0↓)
duckdb / vortex-compact (0.961x ➖, 0↑ 0↓)
duckdb / parquet (0.969x ➖, 0↑ 0↓)
duckdb / duckdb (0.989x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (1.027x ➖, 0↑ 3↓)
datafusion / vortex-compact (1.022x ➖, 0↑ 1↓)
datafusion / parquet (1.105x ➖, 1↑ 6↓)
duckdb / vortex-file-compressed (0.983x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.009x ➖, 0↑ 1↓)
duckdb / parquet (1.003x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.113x ❌, 0↑ 25↓)
datafusion / parquet (1.085x ➖, 0↑ 20↓)
duckdb / vortex-file-compressed (1.080x ➖, 0↑ 20↓)
duckdb / parquet (1.080x ➖, 0↑ 14↓)
duckdb / duckdb (1.067x ➖, 0↑ 12↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: Random AccessVortex (geomean): 0.901x ➖ unknown / unknown (1.001x ➖, 6↑ 9↓)
|
Benchmarks: CompressionVortex (geomean): 0.999x ➖ unknown / unknown (0.997x ➖, 0↑ 0↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.995x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.013x ➖, 0↑ 1↓)
datafusion / parquet (1.083x ➖, 0↑ 2↓)
duckdb / vortex-file-compressed (1.063x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.046x ➖, 0↑ 0↓)
duckdb / parquet (1.080x ➖, 0↑ 0↓)
Full attributed analysis
|
Summary
Tracking Issue: #7297
Adds a basic vector similarity search benchmark to the Vortex benchmark suite as a binary.
Here is an example of how to run this:
The main scan logic is in
scan_one_fileinbenchmarks/vector-search-bench/src/scan.rs, and everything else is just setup for that.Future Work
Testing
The benchmark running successfully for all datasets is sufficient.