1. System Architecture

Here is the complete system architecture and design document for the NotebookLM-inspired SaaS application. This document serves as the foundational blueprint for implementation. --- # 1. System Architecture The application follows a decoupled client-server architecture, separating the UI layer, processing backend, and data/storage layer to ensure scalability and cost-efficiency. * **Frontend (Next.js on Vercel):** Handles routing, state management, UI rendering, and streaming LLM responses. Communicates securely with the backend via REST and Server-Sent Events (SSE). * **Backend (FastAPI on Railway/Fly.io):** Handles heavy compute: document parsing, chunking, embedding generation, vector search, and LLM orchestration. * **Data & Auth Layer (Supabase):** * **Auth:** Supabase Authentication (JWT-based). * **Database:** PostgreSQL with `pgvector` for relationship mapping and semantic search. * **Storage:** Supabase Storage (S3-compatible) for retaining raw user-uploaded documents. * **AI Layer:** * **LLMs:** OpenRouter API (Mixtral 8x7B, DeepSeek Coder, or Llama-3 depending on the task). * **Embeddings:** Hosted locally on the FastAPI backend using `sentence-transformers` (e.g., `BAAI/bge-m3` or `nomic-embed-text`) to achieve zero API costs for embeddings. --- # 2. Database Schema (PostgreSQL + pgvector) ```sql -- Workspaces: Logical grouping of documents and chats CREATE TABLE workspaces ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), user_id UUID REFERENCES auth.users(id) ON DELETE CASCADE, name VARCHAR(255) NOT NULL, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); -- Sources: Uploaded study materials (Max 20 per workspace enforced at app layer) CREATE TABLE sources ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), workspace_id UUID REFERENCES workspaces(id) ON DELETE CASCADE, filename VARCHAR(255) NOT NULL, file_type VARCHAR(50) NOT NULL, -- pdf, docx, txt, md, url storage_path TEXT, -- Path in Supabase Storage status VARCHAR(50) DEFAULT 'processing', -- processing, ready, failed created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); -- Document Chunks: Vector store for RAG CREATE TABLE document_chunks ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), source_id UUID REFERENCES sources(id) ON DELETE CASCADE, content TEXT NOT NULL, metadata JSONB, -- Stores page_number, chunk_index, section_title embedding VECTOR(1024) -- Dimension depends on embedding model ); -- Create HNSW index for fast vector search CREATE INDEX ON document_chunks USING hnsw (embedding vector_cosine_ops); -- Chats: Conversation threads within a workspace CREATE TABLE chats ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), workspace_id UUID REFERENCES workspaces(id) ON DELETE CASCADE, title VARCHAR(255) DEFAULT 'New Chat', created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); -- Messages: Chat history CREATE TABLE messages ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), chat_id UUID REFERENCES chats(id) ON DELETE CASCADE, role VARCHAR(50) NOT NULL, -- user, assistant content TEXT NOT NULL, citations JSONB, -- Array of document_chunk IDs or source metadata created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); -- Quizzes: Generated quizzes CREATE TABLE quizzes ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), workspace_id UUID REFERENCES workspaces(id) ON DELETE CASCADE, title VARCHAR(255) NOT NULL, created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW() ); -- Quiz Questions CREATE TABLE quiz_questions ( id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), quiz_id UUID REFERENCES quizzes(id) ON DELETE CASCADE, question TEXT NOT NULL, options JSONB NOT NULL, -- ["A", "B", "C", "D"] correct_answer VARCHAR(1) NOT NULL, hint TEXT, explanation TEXT, source_citation JSONB -- Reference to chunk or source ); ``` --- # 3. RAG Pipeline Design (Ingestion Workflow) When a user uploads a document, the backend triggers an asynchronous pipeline: 1. **Upload & Storage:** File is uploaded directly to Supabase Storage via signed URL or proxied through the backend. `sources` table is updated to `status: processing`. 2. **Extraction:** * **PDF:** `PyMuPDF` (fitz) or `pdfplumber` to extract text and page numbers. * **DOCX:** `python-docx` for paragraphs. * **URLs:** `Trafilatura` or `BeautifulSoup` for main article content. 3. **Chunking:** * Use LangChain's `RecursiveCharacterTextSplitter`. * Chunk size: ~800 tokens, Overlap: ~150 tokens. * Crucial: Append document name and page number to the top of *each chunk* before embedding, so the model never loses context of *where* the chunk came from. 4. **Embedding:** Generate vector embeddings using the backend Python model (e.g., `sentence-transformers`). 5. **Vector Storage:** Batch insert text chunks, metadata (page #, source_id), and embeddings into `document_chunks`. 6. **Completion:** Update `sources` table `status: ready`. Notify frontend via WebSockets/polling. --- # 4. Vector Search Workflow (Retrieval) When a user asks a question, generates a summary, or requests a quiz: 1. **Query Formulation:** (Optional but recommended) Route the user's raw chat query through a lightweight LLM to rewrite it into a standalone, search-optimized query. 2. **Query Embedding:** Convert the formulated query into an embedding vector using the same local Python embedding model. 3. **Similarity Search:** Execute a `pgvector` similarity search (`<=>` cosine distance) on `document_chunks` filtered by the current `workspace_id`. 4. **Top-K Retrieval:** Retrieve the top 10-15 most relevant chunks. 5. **Context Assembly:** Format the retrieved chunks into a structured text string. Example: `[Source: Biology_Notes.pdf | Page: 4] <chunk text>` 6. **LLM Generation:** Inject the context and the user query into the system prompt. Stream the response back to the frontend, enforcing citation formatting (e.g., "Mitochondria is the powerhouse [1]"). --- # 5. Folder Structure ```text /quizme-monorepo │ ├── /frontend # Next.js Application │ ├── /app │ │ ├── (auth) # Login / Register │ │ ├── dashboard # Workspace list │ │ ├── workspace/[id] # The 3-panel UI │ │ └── api # Next.js route handlers │ ├── /components │ │ ├── layout # 3-panel layout components │ │ ├── chat # Chat UI & streaming │ │ ├── sources # Uploader & source list │ │ └── studio # Quiz & Summary UI │ ├── /lib # Utils, Supabase client │ └── /store # Zustand state (workspace, chat history) │ ├── /backend # FastAPI Application │ ├── /app │ │ ├── /api # Route definitions │ │ ├── /core # Config, auth middleware │ │ ├── /services # RAG, LLM, Extraction logic │ │ ├── /models # Pydantic & SQLAlchemy schemas │ │ └── /utils # Chunking, text parsers │ ├── main.py # Entry point │ └── requirements.txt │ └── /docker # Optional: Docker-compose for local dev ``` --- # 6. API Endpoints (FastAPI) **Workspaces & Sources:** * `GET /api/workspaces` - List user's workspaces * `POST /api/workspaces` - Create new workspace * `GET /api/workspaces/{workspace_id}/sources` - List uploaded materials * `POST /api/workspaces/{workspace_id}/sources/upload` - Upload file & trigger pipeline * `DELETE /api/sources/{source_id}` - Delete a source & cascade delete chunks **Interaction (RAG):** * `POST /api/chat/completions` - Accepts `workspace_id` and `messages`. Returns StreamingResponse (SSE). * `GET /api/chats/{chat_id}/messages` - Fetch chat history. **Studio Tools:** * `POST /api/studio/summary` - Triggers summary generation (can be synchronous or SSE). * `POST /api/studio/quiz` - Generates quiz payload. Requires structured JSON response from LLM. --- # 7. Prompt Engineering Templates ### 1. Chat with Sources (Strict RAG) ```text System: You are an expert AI learning assistant. Your task is to answer the user's question based strictly on the provided <context> retrieved from their study materials. Rules: 1. Do not use outside knowledge. If the answer is not in the context, say "I cannot find the answer to this in your uploaded documents." 2. Cite your sources using the source ID or Document Name and Page Number provided in the context blocks. Format citations like this: [DocName, Page X]. 3. Be concise, objective, and clear. <context> {context_blocks} </context> ``` ### 2. Quiz Generation (Structured JSON) *Using OpenRouter's structured output / JSON mode (e.g., via `instructor` library)* ```text System: You are a master educator. Generate a multiple-choice quiz based on the provided <context>. You must output a valid JSON array of objects. Do not output markdown code blocks. Generate {num_questions} questions. Rules for each question: 1. The question must test deep understanding, not just trivia. 2. Provide exactly 4 options labeled "A", "B", "C", "D". 3. The correct_answer must be the letter only. 4. The hint should nudge the user without giving away the answer. 5. The explanation must explain WHY the answer is correct using the text. 6. Provide the source_citation exactly as it appears in the context header. <context> {context_blocks} </context> Expected JSON Schema: { "questions": [ { "question": "string", "options": {"A": "string", "B": "string", "C": "string", "D": "string"}, "correct_answer": "string", "hint": "string", "explanation": "string", "source_citation": "string" } ] } ``` *(Note: Hint and Explanation are generated concurrently to save context window and API calls, rather than doing separate calls for hints).* ### 3. Document Summary ```text System: You are an expert summarizer. Provide a comprehensive summary of the provided <context> documents. Format your response strictly as follows: ## Overview [1-2 paragraphs giving a high-level summary] ## Key Concepts * [Concept 1]: [Brief explanation] * [Concept 2]: [Brief explanation] ... ## Structured Outline 1. [Topic] - [Subtopic] 2. [Topic] - [Subtopic] Ensure all points are grounded in the provided text. <context> {context_blocks} </context> ``` --- # 8. Recommended Libraries & Frameworks ### Frontend * **Framework:** Next.js 14+ (App Router) * **State Management:** Zustand (perfect for managing the 3-panel state without prop-drilling) * **Data Fetching:** SWR or React Query * **Styling:** Tailwind CSS + ShadCN UI + Radix UI primitives * **AI Streaming:** Vercel AI SDK (`npm i ai`) - specifically `useChat` for the center panel and `streamText` for summaries. * **PDF Rendering (optional):** `react-pdf` if you want to show the actual PDF in the left panel. ### Backend * **Framework:** FastAPI (Python 3.11+) * **Database ORM:** `supabase-py` (for direct API interactions) or `SQLAlchemy` + `psycopg2` for raw pgvector queries. * **LLM Orchestration:** `Instructor` (for enforcing strict JSON outputs for Quizzes) + `OpenAI` python SDK (configured with OpenRouter base URL). * **Document Parsers:** * `PyMuPDF` (`fitz`): Fastest and most reliable for PDFs. * `python-docx`: For Word documents. * `Trafilatura`: For scraping clean text from URLs. * **Embeddings:** `sentence-transformers` (runs locally, completely free). * **Chunking:** `langchain-text-splitters` (you only need this specific package, avoid pulling in the whole bloated LangChain framework).

Related Documents

cheap-RAG Development Roadmap

Semblance AI — Development Roadmap

Changelog

Toasty — AI Triage & Responsible Disclosure Assistant (2026 — 350 hours)