A Lang-chain project using Cassandra DB supported by Google Gemini AI model and API. Showcase use RAG pipeline for usage of PDF source for answering the input queries
graph TD
subgraph Ingestion_Flow
A[PDF Document] --> B["Reading the Document (Text Extraction & Splitting)"]
B --> C[Text Chunks]
C --> D[Google Gemini Embeddings]
D --> E["Vector Database (Cassandra/Astra DB)"]
end
subgraph Query_Flow
F[Human] --> G["Text + Query"]
G -- "Similarity Search" --> E
E -- "DataStax Vector Search" --> H["Text embeddings \n(Relevant context)"]
H --> I[Google Gemini LLM]
G --> I
I --> J[Final Answer]
end
-
Install Dependencies:
pip install -r requirements.txt
-
Environment Variables: Create a
.envfile based on.env.exampleand add your credentials:ASTRA_DB_APPLICATION_TOKENASTRA_DB_IDGOOGLE_API_KEY
Launch the Streamlit application:
streamlit run app.py- Dynamic PDF Upload: Upload any PDF to query its content.
- Astra DB Integration: Powered by DataStax for high-performance vector search.
- Google Gemini AI: Uses state-of-the-art models for embeddings and text generation.