R&D RAG AI Agent by consuming vary documents. This serve as a personal experiment with AI
# RAG AI Application
A Retrieval-Augmented Generation (RAG) application that enables users to chat with their PDF documents using Google's Gemini AI model.
This is for my personal experiment use only & educational purposes.
## Features
- PDF document upload and processing
- Vector embeddings generation using Gemini API
- Similarity search using pgvector
- Conversational AI with context from uploaded documents
- Dark/Light mode support
- User-provided API key management
## Tech Stack
### Frontend
- Next.js 14
- React
- Flowbite UI Components
- TailwindCSS
### Backend
- FastAPI
- LangChain
- Google Gemini API
- PyPDF
### Database
- Supabase
- PostgreSQL with pgvector extension
## Technical Architecture
```mermaid
graph TB
subgraph Frontend["Frontend (Next.js)"]
UI[User Interface]
Upload[PDF Upload Component]
Chat[Chat Component]
Theme[Theme Provider]
end
subgraph Backend["Backend (FastAPI)"]
API[FastAPI Endpoints]
RAG[RAG Service]
PDF[PDF Processor]
LLM[Gemini LLM]
end
subgraph Database["Vector Database (Supabase)"]
PG[PostgreSQL]
Vector[pgvector]
Embed[Embeddings Storage]
end
%% User Flow
User((User)) -->|1. Uploads PDF| Upload
Upload -->|2. Send PDF| API
API -->|3. Process| PDF
PDF -->|4. Generate Embeddings| RAG
RAG -->|5. Store Vectors| Vector
%% Chat Flow
User -->|6. Ask Question| Chat
Chat -->|7. Query| API
API -->|8. Get Context| RAG
RAG -->|9. Search Similar| Vector
Vector -->|10. Return Context| RAG
RAG -->|11. Generate Answer| LLM
LLM -->|12. Response| API
API -->|13. Display Answer| Chat
%% Styling
classDef frontend fill:#47B4B6,stroke:#333,stroke-width:2px;
classDef backend fill:#FF9776,stroke:#333,stroke-width:2px;
classDef database fill:#7C3AED,stroke:#333,stroke-width:2px;
classDef user fill:#4CAF50,stroke:#333,stroke-width:2px;
class UI,UploGoogle's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.
Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.
Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.
Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.
Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.
Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.