feat(lancedb): enhance index types and CLI integration#787
Open
XuQianJin-Stars wants to merge 1 commit into
Open
feat(lancedb): enhance index types and CLI integration#787XuQianJin-Stars wants to merge 1 commit into
XuQianJin-Stars wants to merge 1 commit into
Conversation
…upport, batch insert optimization and full CLI integration Core changes: - api.py: Add IVF_HNSW_SQ and IVF_HNSW_PQ to IndexType enum - config.py: Rewrite into 5 independent config classes (IVF_PQ / NONE / AUTOINDEX / IVF_HNSW_SQ / IVF_HNSW_PQ); all IVF variants share refine_factor search param; add storage_options for remote storage - lancedb.py: Implement standard prepare_filter() pattern (NumGE/StrEqual); support scalar labels insert; use PyArrow FixedSizeListArray for batch writes instead of per-row dicts; unify search param passing; optimize() supports compact_files + cleanup_old_versions; custom __deepcopy__ to solve multi-process Rust handle serialization issue; explicit select _distance to suppress lance deprecation warning - cli.py: Add 5 CLI commands (LanceDB/AutoIndex/IVFPQ/IVFHNSWSQ/IVFHNSWPQ); fix IndexType.NONE lookup bug; add COS/GooseFS remote storage_options builder - vectordbbench.py: Register all new LanceDB CLI commands New files: - docs/lancedb-enhancement-plan.md: Development plan and implementation notes - docs/lancedb-integration.md: Integration verification report - tests/test_lancedb_config.py: 10 offline unit tests covering config/registration/CLI structure - scripts/bench_lancedb_500k.sh: One-click 500K three-index comparison benchmark script - scripts/aggregate_lancedb_results.py: Aggregate results into Markdown comparison table
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: XuQianJin-Stars The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Author
|
/assign @XuanYang-cn |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rewrites the
lancedbclient for LanceDB, adding support for IVF_HNSW_SQ and IVF_HNSW_PQ index types, standard filter queries, PyArrow batch insert optimization, COS/GooseFS remote storage, and full CLI integration.What's added
vectordb_bench/backend/clients/lancedb/{lancedb.py,config.py,cli.py}vectordbbench LanceDB .../LanceDBAutoIndex/LanceDBIVFPQ/LanceDBIVFHNSWSQ/LanceDBIVFHNSWPQvectordb_bench/backend/clients/api.py— addIVF_HNSW_SQandIVF_HNSW_PQtoIndexTypeenumtests/test_lancedb_config.py— 10 offline tests covering config defaults,to_dictconnection options,index_param/search_paramgeneration, metric parsing, case-config registry, and CLI structurevectordb_bench/cli/vectordbbench.py— register all new LanceDB CLI commandsAlgorithms supported
num_partitions,num_sub_vectors,nbits,sample_rate,max_iterationsnprobes,refine_factornprobes,refine_factornum_partitions,m,ef_constructionef,nprobes,refine_factornum_partitions,num_sub_vectors,m,ef_constructionef,nprobes,refine_factorrefine_factorKey improvements
prepare_filter()pattern (NumGE/StrEqual)FixedSizeListArrayfor batch writes instead of per-row dicts__deepcopy__to solve Rust handle serialization issuestorage_optionsfor COS/GooseFS backendscompact_files+cleanup_old_versionssupportExample
vectordbbench LanceDBIVFHNSWSQ --case-type Performance768D1M --k 10 \ --uri /tmp/lancedb \ --num-partitions 256 --m 16 --ef-construction 128 \ --ef 64 --nprobes 20 --refine-factor 25vectordbbench LanceDBIVFPQ --case-type Performance768D1M --k 10 \ --uri s3://my-bucket/lancedb \ --storage-options '{"aws_access_key_id":"...","aws_secret_access_key":"..."}' \ --num-partitions 256 --num-sub-vectors 48 --nprobes 20