Loading...
Loading...
Loading...
End-to-End System Handoff Document
1. Project Overview
1.1 What Is PathRAG?
PathRAG (Path-based Retrieval-Augmented Generation) is a method introduced in the paper:
PathRAG: Pruning Graph-based Retrieval Augmented Generation with Relational Paths
and implemented in the repository:
https://github.com/BUPT-GAMMA/PathRAG.
Key Points:
Purpose: Improve retrieval-augmented generation by extracting and utilizing key relational paths from an indexing graph rather than retrieving all related information.
Method: Uses a flow-based pruning algorithm with decay and early stopping mechanisms to select the most reliable relational paths between query-related nodes. These paths are then structured into a prompt for an LLM to generate coherent answers.
Results: The paper demonstrates that PathRAG outperforms baseline methods across multiple evaluation dimensions (comprehensiveness, logicality, etc.).
1.2 Project Goals
Develop a fully Dockerized application that enables users to:
Upload Data: Accept files (e.g., PDFs, text files) or raw text for domain-specific data upload.
Build a Knowledge Graph (KG): Automatically extract entities and relationships to construct a KG from the uploaded data.
Generate Embeddings & Store in a Vector Store: Compute dense embeddings for KG nodes using a consistent embedding model and ingest them into Weaviate.
Query the KG: Retrieve relevant nodes using Weaviate, apply a PathRAG-inspired retrieval engine to prune and select key relational paths, and format these into a structured prompt.
Generate Answers via an LLM: Provide a dropdown in the frontend for model selection. The backend will route the prompt to an Ollama container running on the same Docker network, which will load the chosen model.
Visualize the KG & Results: Offer a dashboard that displays both the KG visualization and the query results.
2. System Architecture
All components run within a Docker network managed by Docker Compose to ensure seamless integration and deployment.
2.1 Components Overview
Frontend (Vite + React):
Features: File uploads, KG visualization (using D3.js or Cytoscape.js), a query form, and a dropdown menu to select the desired model for the LLM.
Backend (FastAPI):
Responsibilities: Data ingestion, KG construction, embedding generation, querying Weaviate, executing the PathRAG retrieval algorithm, and routing prompts to the Ollama container based on model selection.
Vector Store (Weaviate):
Usage: Store dense embeddings for KG nodes and support efficient similarity searches.
LLM Container (Ollama):
Details: A single container running Ollama on the same Docker network. It is configured to load and run the model selected by the user via the frontend dropdown.
PathRAG Retrieval Engine:
Functionality: Implements the flow-based pruning algorithm to select key relational paths from the retrieved nodes and format them into a prompt for the LLM.
3. Detailed File Structure
Below is a suggested file structure for the entire project:
bash
Copy
/project-root
├── README.md # Project overview, deployment instructions, and references (PathRAG paper & repo)
├── docker-compose.yml # Docker Compose configuration for all services (frontend, backend, Weaviate, Ollama)
├── /frontend # Vite + React frontend application
│ ├── package.json
│ ├── vite.config.js
│ ├── public/
│ ├── src/
│ │ ├── App.jsx # Main component integrating Upload, KG Visualization, and Query Form
│ │ ├── index.jsx
│ │ └── components/
│ │ ├── UploadComponent.jsx # Handles file uploads and progress feedback
│ │ ├── KGVisualization.jsx # Displays interactive KG graph
│ │ └── QueryForm.jsx # Query input form with a dropdown for LLM model selection
│ └── README.md # Frontend-specific documentation
├── /backend # FastAPI backend service
│ ├── Dockerfile # Dockerfile to build the FastAPI container
│ ├── requirements.txt # Python dependencies (FastAPI, uvicorn, weaviate-client, spaCy/transformers, etc.)
│ ├── main.py # Entry point for the FastAPI app
│ ├── /app
│ │ ├── __init__.py
│ │ ├── config.py # Configuration settings (e.g., Weaviate URL, Ollama endpoint, embedding model details)
│ │ ├── /routes
│ │ │ ├── __init__.py
│ │ │ ├── upload.py # Endpoints for file uploads and KG construction
│ │ │ └── query.py # Endpoints for processing queries
│ │ └── /modules
│ │ ├── __init__.py
│ │ ├── kg_builder.py # Functions to extract text, perform entity/relation extraction, and build the KG
│ │ ├── embedding.py # Functions to generate dense embeddings using a SentenceTransformer or similar
│ │ ├── pathrag.py # Implementation of the flow-based pruning algorithm for relational path retrieval
│ │ └── llm_interface.py # Interfaces to call the Ollama container using the selected model from the dropdown
│ └── README.md # Backend-specific documentation
├── /ollama # Directory for the Ollama container (single container running Ollama)
│ ├── Dockerfile # Dockerfile to build the Ollama container
│ ├── config.yaml # Ollama configuration file (if applicable) to support model loading based on API calls
│ └── README.md # Instructions for running Ollama and its API endpoint details
└── /weaviate # (Optional) Custom configuration for Weaviate
└── Dockerfile # (If building a custom Weaviate image) or use the official image in docker-compose.yml
4. Step-by-Step Implementation Plan
Step 1: Repository Setup
Initialize the Repository:
Create the project directory structure as outlined above.
Include a high-level README.md that describes the project’s purpose, deployment instructions, and references:
PathRAG Paper: https://arxiv.org/abs/2502.14902
PathRAG Repository: https://github.com/BUPT-GAMMA/PathRAG
Docker Compose Setup:
Create a docker-compose.yml at the project root that defines services for:
frontend: Vite + React app.
backend: FastAPI service.
weaviate: The vector store.
ollama: The single Ollama container that loads the selected model.
Step 2: Frontend Development (Vite + React)
Initialize Vite Project:
In the /frontend folder, use npm create vite@latest to set up a React project.
Develop Core Components:
UploadComponent.jsx: Implement functionality to handle file uploads with progress indicators.
KGVisualization.jsx: Build an interactive KG visualization using D3.js or Cytoscape.js.
QueryForm.jsx: Create a form that includes:
A text input for the query.
A dropdown menu populated with available LLM model names/IDs.
A submit button that sends the query and selected model to the backend.
App.jsx: Integrate the above components and handle API calls to the backend.
Local Testing:
Run the frontend locally (e.g., using npm run dev) to verify that the UI components function correctly.
Step 3: Backend Development (FastAPI)
Set Up FastAPI Project:
In the /backend folder, create a Python virtual environment.
List dependencies in requirements.txt (including FastAPI, uvicorn, weaviate-client, spaCy, transformers, etc.).
Implement Endpoints:
Upload Endpoint (routes/upload.py):
Accept file uploads or raw text.
Trigger asynchronous processing to:
Extract text from uploaded data.
Perform NLP (entity and relation extraction) to build a KG.
Generate dense embeddings for each KG node.
Ingest the embeddings into Weaviate.
Query Endpoint (routes/query.py):
Accept the user query along with the selected LLM model identifier.
Convert the query to an embedding.
Query Weaviate to retrieve candidate nodes.
Use the PathRAG module (modules/pathrag.py) to perform flow-based pruning and select key relational paths.
Format a structured prompt (combining the query and retrieved paths).
Call the Ollama container via the LLM interface (modules/llm_interface.py), passing along the selected model.
Return the generated answer and supporting details.
Module Integration:
KG Builder (modules/kg_builder.py): Implements text extraction and KG construction.
Embedding (modules/embedding.py): Generates dense embeddings using a chosen SentenceTransformer model.
PathRAG Retrieval (modules/pathrag.py): Implements the flow-based pruning algorithm.
LLM Interface (modules/llm_interface.py): Routes the formatted prompt to the Ollama container (using its API) and passes the selected model parameter.
Testing:
Test endpoints locally (e.g., using Postman) to ensure that file uploads, KG construction, and query processing work as expected.
Step 4: Weaviate Vector Store Setup
Deploy Weaviate:
Use the official Weaviate Docker image via Docker Compose.
Define Schema & Ingestion:
Create a schema with fields like “entity name,” “text context,” and “embedding vector.”
Integrate the backend to ingest KG node embeddings into Weaviate.
Verification:
Test similarity queries on Weaviate to ensure embeddings are correctly stored and retrieved.
Step 5: Ollama Container Setup
Configure the Ollama Container:
In the /ollama folder, create a Dockerfile to build a container that runs Ollama.
Configure the container (and possibly a configuration file such as config.yaml) so that it exposes an API endpoint which:
Accepts a prompt.
Accepts a model identifier (from the dropdown).
Loads the selected model dynamically.
Returns the generated answer.
LLM Interface Integration:
In the backend’s modules/llm_interface.py, implement the logic to call the Ollama container’s API, passing the prompt and the selected model.
Testing:
Manually test the Ollama container by sending sample prompts with different model identifiers and verifying the responses.
Step 6: Dockerization & Orchestration
Create Dockerfiles for Each Component:
Frontend: Build a Dockerfile in /frontend.
Backend: Build a Dockerfile in /backend.
Ollama: Build a Dockerfile in /ollama.
Weaviate: Use the official image or a custom Dockerfile in /weaviate if needed.
Docker Compose Configuration:
In docker-compose.yml, define services:
frontend: Build from /frontend.
backend: Build from /backend.
weaviate: Use the official image.
ollama: Build from /ollama and expose its API endpoint.
Configure environment variables and networking so that each service can communicate using Docker service names.
Deployment Testing:
Run docker-compose up --build and verify that all containers start and interconnect correctly.
Test the full end-to-end workflow from the frontend (file upload, KG build, query submission) to answer generation via Ollama.
Step 7: Final Testing, Documentation & Handoff
End-to-End Testing:
Upload sample data via the frontend and confirm that the KG is built, embeddings are generated, and data is ingested into Weaviate.
Submit queries and verify that:
The correct model is selected from the dropdown.
The PathRAG retrieval process returns key relational paths.
The backend correctly forwards the prompt to the Ollama container, which then returns a generated answer.
Documentation:
Update all README files (project-root, /frontend, /backend, /ollama) with detailed setup, configuration, and troubleshooting instructions.
Ensure the high-level README includes references to the PathRAG paper and repository.
Monitoring & Logging:
Set up basic logging for the backend and Ollama container.
Provide instructions for accessing logs and monitoring system performance.
Handoff Checklist:
Verify the file structure is as specified.
Confirm that docker-compose.yml orchestrates all services on a shared Docker network.
Ensure all endpoints and integrations (KG builder, embedding generation, PathRAG retrieval, and Ollama interface) are fully functional.
Finalize and include all documentation and reference links.
This roadmap outlines planned enhancements to transform cheap-RAG from a functional document retrieval system into a production-ready, state-of-the-art RAG framework. Priorities are based on impact vs. effort analysis and alignment with mainstream RAG best practices.
See `specs/Semblance-MVP-Plan-v2.md` for full technical specification.
All notable changes to AvocadoDB will be documented in this file.
**Goal:** Stand up Toasty as a reliable service wired to BLT/GitHub events; deliver safe, useful summaries early.