PDF-Query

A Lang-chain project using Cassandra DB supported by Google Gemini AI model and API. Showcase use RAG pipeline for usage of PDF source for answering the input queries

Dataflow Diagram

graph TD
    subgraph Ingestion_Flow
        A[PDF Document] --> B["Reading the Document (Text Extraction & Splitting)"]
        B --> C[Text Chunks]
        C --> D[Google Gemini Embeddings]
        D --> E["Vector Database (Cassandra/Astra DB)"]
    end

    subgraph Query_Flow
        F[Human] --> G["Text + Query"]
        G -- "Similarity Search" --> E
        E -- "DataStax Vector Search" --> H["Text embeddings \n(Relevant context)"]
        H --> I[Google Gemini LLM]
        G --> I
        I --> J[Final Answer]
    end

Setup

Install Dependencies:
```
pip install -r requirements.txt
```
Environment Variables: Create a .env file based on .env.example and add your credentials:
- ASTRA_DB_APPLICATION_TOKEN
- ASTRA_DB_ID
- GOOGLE_API_KEY

How to Run

Launch the Streamlit application:

streamlit run app.py

Features

Dynamic PDF Upload: Upload any PDF to query its content.
Astra DB Integration: Powered by DataStax for high-performance vector search.
Google Gemini AI: Uses state-of-the-art models for embeddings and text generation.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
langchain.ipynb		langchain.ipynb
learn.txt		learn.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF-Query

Dataflow Diagram

Setup

How to Run

Features

Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF-Query

Dataflow Diagram

Setup

How to Run

Features

Resources

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages