Skip to content

feat(lancedb): enhance index types and CLI integration#787

Open
XuQianJin-Stars wants to merge 1 commit into
zilliztech:mainfrom
XuQianJin-Stars:feature/lancedb-enhancement
Open

feat(lancedb): enhance index types and CLI integration#787
XuQianJin-Stars wants to merge 1 commit into
zilliztech:mainfrom
XuQianJin-Stars:feature/lancedb-enhancement

Conversation

@XuQianJin-Stars
Copy link
Copy Markdown

Summary

Rewrites the lancedb client for LanceDB, adding support for IVF_HNSW_SQ and IVF_HNSW_PQ index types, standard filter queries, PyArrow batch insert optimization, COS/GooseFS remote storage, and full CLI integration.

What's added

  • Client implementation: vectordb_bench/backend/clients/lancedb/{lancedb.py,config.py,cli.py}
  • CLI commands: vectordbbench LanceDB ... / LanceDBAutoIndex / LanceDBIVFPQ / LanceDBIVFHNSWSQ / LanceDBIVFHNSWPQ
  • Index type registration: vectordb_bench/backend/clients/api.py — add IVF_HNSW_SQ and IVF_HNSW_PQ to IndexType enum
  • Unit tests: tests/test_lancedb_config.py — 10 offline tests covering config defaults, to_dict connection options, index_param / search_param generation, metric parsing, case-config registry, and CLI structure
  • Entry point: vectordb_bench/cli/vectordbbench.py — register all new LanceDB CLI commands

Algorithms supported

Algorithm Build params Search params
IVF_PQ (default) num_partitions, num_sub_vectors, nbits, sample_rate, max_iterations nprobes, refine_factor
AutoIndex (auto — metric only) nprobes, refine_factor
IVF_HNSW_SQ num_partitions, m, ef_construction ef, nprobes, refine_factor
IVF_HNSW_PQ num_partitions, num_sub_vectors, m, ef_construction ef, nprobes, refine_factor
NONE (brute-force) refine_factor

Key improvements

  • Filter support: Implement standard prepare_filter() pattern (NumGE / StrEqual)
  • Batch insert: Use PyArrow FixedSizeListArray for batch writes instead of per-row dicts
  • Multi-process: Custom __deepcopy__ to solve Rust handle serialization issue
  • Remote storage: storage_options for COS/GooseFS backends
  • Optimize: compact_files + cleanup_old_versions support

Example

vectordbbench LanceDBIVFHNSWSQ --case-type Performance768D1M --k 10 \
    --uri /tmp/lancedb \
    --num-partitions 256 --m 16 --ef-construction 128 \
    --ef 64 --nprobes 20 --refine-factor 25
vectordbbench LanceDBIVFPQ --case-type Performance768D1M --k 10 \
    --uri s3://my-bucket/lancedb \
    --storage-options '{"aws_access_key_id":"...","aws_secret_access_key":"..."}' \
    --num-partitions 256 --num-sub-vectors 48 --nprobes 20

…upport, batch insert optimization and full CLI integration

Core changes:
- api.py: Add IVF_HNSW_SQ and IVF_HNSW_PQ to IndexType enum
- config.py: Rewrite into 5 independent config classes (IVF_PQ / NONE / AUTOINDEX / IVF_HNSW_SQ /
  IVF_HNSW_PQ); all IVF variants share refine_factor search param; add storage_options for remote storage
- lancedb.py: Implement standard prepare_filter() pattern (NumGE/StrEqual); support scalar labels insert;
  use PyArrow FixedSizeListArray for batch writes instead of per-row dicts; unify search param passing;
  optimize() supports compact_files + cleanup_old_versions; custom __deepcopy__ to solve
  multi-process Rust handle serialization issue; explicit select _distance to suppress lance deprecation warning
- cli.py: Add 5 CLI commands (LanceDB/AutoIndex/IVFPQ/IVFHNSWSQ/IVFHNSWPQ);
  fix IndexType.NONE lookup bug; add COS/GooseFS remote storage_options builder
- vectordbbench.py: Register all new LanceDB CLI commands

New files:
- docs/lancedb-enhancement-plan.md: Development plan and implementation notes
- docs/lancedb-integration.md: Integration verification report
- tests/test_lancedb_config.py: 10 offline unit tests covering config/registration/CLI structure
- scripts/bench_lancedb_500k.sh: One-click 500K three-index comparison benchmark script
- scripts/aggregate_lancedb_results.py: Aggregate results into Markdown comparison table
@sre-ci-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: XuQianJin-Stars
To complete the pull request process, please assign xuanyang-cn after the PR has been reviewed.
You can assign the PR to them by writing /assign @xuanyang-cn in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@XuQianJin-Stars
Copy link
Copy Markdown
Author

/assign @XuanYang-cn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants