Semantic search for PhysLib — the formal Lean 4 physics library.
Live at physlibsearch.net
Search PhysLib theorems, definitions, and lemmas using plain English — no Lean syntax required.
> Newton's second law
theorem Physlib.Mechanics.Newton.secondLaw : ∀ (m F : ℝ), m > 0 → acceleration m F = F / m
Newton's Second Law: for a body of mass m > 0 and applied force F, the acceleration satisfies a = F/m.
> energy of a harmonic oscillator
theorem Physlib.QuantumMechanics.HarmonicOscillator.energy : ∀ (n : ℕ), E n = ℏ * ω * (n + 1/2)
Quantum harmonic oscillator energy levels: E_n = ℏω(n + ½).
PhyslibSearch combines dense vector retrieval with LLM-powered query expansion to find relevant Lean declarations from a natural-language query.
User query
│
▼
┌─────────────────────────────────────────┐
│ HyDE Query Expansion (optional) │
│ Gemini generates a plausible │
│ hypothetical Lean declaration │
│ matching the query │
└──────────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ Gemini Embedding │
│ Query → 3072-dim cosine vector │
└──────────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ ChromaDB Vector Search │
│ Nearest-neighbor lookup in the index │
└──────────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ PostgreSQL Record Fetch │
│ Full declaration + informal description│
└─────────────────────────────────────────┘
Indexing pipeline (run once):
- jixia parses PhysLib Lean source → PostgreSQL
- Gemini translates each formal declaration into a natural-language name + description (Herald-style)
- Gemini Embeddings encodes descriptions → ChromaDB vector store
See docs/architecture.md for a detailed breakdown.
| Dependency | Notes |
|---|---|
| Python 3.11+ | |
| PostgreSQL 14+ | createdb must be on your PATH |
| jixia | Lean 4 project parser — build from source |
| Lean 4 toolchain | Must match the toolchain of the project you index |
| Google Gemini API key | Get one free |
| Node.js 20+ | Frontend only |
git clone https://github.com/Kernel-Science/physlibsearch.git
cd physlibsearch
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtcreatedb physlibsearchgit clone https://github.com/frenzymath/jixia.git
cd jixia
lake build # ~70 s
cd ..Make sure
lean-toolchainin jixia matches the toolchain used by the project you want to index. If they differ you'll see"failed to read file … invalid header"during indexing.
cp .env.example .env
# Edit .env — set JIXIA_PATH, LEAN_SYSROOT, CONNECTION_STRING, GEMINI_API_KEYSee docs/development.md for a full variable reference.
# Parse Lean source
python -m database jixia /path/to/physlib Physlib,QuantumInfo
# Generate informal descriptions (calls Gemini — takes a while for large projects)
python -m database informal
# Build the vector index
python -m database vector-dbUse DRY_RUN=true to test the pipeline without spending API quota.
# Backend
uvicorn server:app --reload
# Frontend (separate terminal)
cd frontend
npm install
npm run devOpen http://localhost:3000.
python search.py "Schrödinger equation"
python search.py "Newton's second law" "conservation of energy" --jsonphyslibsearch/
├── server.py # FastAPI server (search, expand, fetch, modules, feedback)
├── engine.py # PhyslibSearchEngine — ChromaDB + PostgreSQL retrieval
├── query_expansion.py # QueryExpander — HyDE via Gemini
├── search.py # CLI search interface
├── prefix.py # Module prefix helpers
│
├── database/ # Indexing pipeline
│ ├── __init__.py # CLI entry point (schema / jixia / informal / vector-db)
│ ├── create_schema.py # PostgreSQL schema
│ ├── jixia_db.py # Load jixia output → PostgreSQL
│ ├── informalize.py # Generate informal descriptions with Gemini
│ ├── translate.py # Herald-style formal→natural translation
│ ├── vector_db.py # Build ChromaDB embeddings
│ └── embedding.py # Gemini embedding client
│
├── prompt/ # Jinja2 prompt templates
│ ├── augment_prompt.j2
│ ├── augment_assistant.txt
│ ├── theorem.md.j2
│ ├── definition.md.j2
│ └── ...
│
├── frontend/ # Next.js 16 + HeroUI web interface
│ └── src/
│ ├── app/ # Pages (search, browse, docs)
│ ├── components/ # SearchPage, ResultCard, Header, Footer, …
│ ├── lib/api.ts # API client
│ └── types/ # TypeScript types
│
└── docs/
├── architecture.md # System design deep-dive
├── development.md # Local dev setup
└── deployment.md # Self-hosting guide
| Method | Path | Description |
|---|---|---|
POST |
/search |
Search declarations by natural language |
POST |
/expand |
HyDE query expansion |
POST |
/fetch |
Fetch specific declarations by name |
GET |
/modules |
List indexed modules |
POST |
/modules/declarations |
Get all declarations in a module |
POST |
/feedback |
Submit feedback on a result |
Full API reference: docs/architecture.md#api.
All configuration is through environment variables (.env). See .env.example for the full list with explanations.
| Variable | Required | Description |
|---|---|---|
JIXIA_PATH |
Yes | Path to compiled jixia binary |
LEAN_SYSROOT |
Yes | Lean 4 toolchain root |
CONNECTION_STRING |
Yes | PostgreSQL connection string |
GEMINI_API_KEY |
Yes | Google Gemini API key |
GEMINI_MODEL |
No | Model for informalization (default: gemini-3.1-pro-preview) |
GEMINI_FAST_MODEL |
No | Model for query expansion (default: gemini-3-flash-preview) |
GEMINI_EMBEDDING_MODEL |
No | Embedding model (default: gemini-embedding-2-preview) |
CHROMA_PATH |
No | ChromaDB storage path (default: chroma) |
DRY_RUN |
No | Skip API calls during indexing (default: false) |
LLM_API_KEY / LLM_BASE_URL |
No | Custom LLM endpoint (e.g. OpenRouter) |
Contributions are welcome. See CONTRIBUTING.md to get started.
-
Continuous integration — As PhysLib grows, the database should update automatically when new declarations are merged. The indexing pipeline is designed to be incremental (each SQL step uses
ON CONFLICT DO NOTHING, and the informalization and embedding steps skip already-processed entries), but this needs to be verified end-to-end for every stage before setting up a CI workflow that triggers on new PhysLib commits. -
Index QuantumInfo — The
QuantumInfomodule has not been indexed due to integration issues between PhysLib's QuantumInfo dependencies and jixia. Once resolved, re-runningpython -m database jixiashould pick it up without reprocessing the rest of the library.
The HyDE query expansion technique is inspired by the LeanSearch paper. Natural-language translation of formal declarations follows the approach described in Herald.
