WARP.md

# WARP.md This file provides guidance to WARP (warp.dev) when working with code in this repository. ## Project Overview RAG Troubleshooter is a production-ready RAG service for medical device troubleshooting with vendor-neutral LLM support. It provides hallucination-safe, explainable troubleshooting guidance by ingesting service manual PDFs into a vector database and generating grounded answers via Claude, OpenAI, or Gemini. ## Development Commands ### Environment Setup ```powershell # Install dependencies poetry install # Or with pip pip install -e . # Configure environment variables Copy-Item .env.example .env # Then edit .env with your API keys ``` ### Running the Server ```powershell # Run with Poetry (recommended for development with hot reload) poetry run python app/main.py # Or with uvicorn directly poetry run uvicorn app.main:app --reload # Production mode (no reload) poetry run uvicorn app.main:app --host 0.0.0.0 --port 8000 ``` Server endpoints: - Demo UI: http://localhost:8000 - API Docs: http://localhost:8000/docs - Health Check: http://localhost:8000/api/ ### Document Ingestion ```powershell # Run the ingestion script to load device manuals poetry run python ingest_devices.py # Or via API (requires running server) curl -X POST http://localhost:8000/api/ingest ` -H "Content-Type: application/json" ` -H "X-API-Key: dev-key-123" ` -d '{"device_name": "Bair Hugger Model 775", "filename": "bair_hugger_manual.pdf"}' ``` ### Testing ```powershell # Run tests (when implemented) poetry run pytest # Run tests with coverage poetry run pytest --cov=app --cov-report=html ``` ### Code Quality ```powershell # Format code (line length: 100) poetry run black app/ # Lint code poetry run ruff app/ # Type check poetry run mypy app/ ``` ## Architecture Overview ### Core Design Principles 1. **Pluggable Components**: Abstract base classes enable easy swapping of vector stores and LLM providers 2. **Hallucination Prevention**: Strict prompts, low temperature (0.1), required citations, "No grounded steps found" fallback 3. **Async Throughout**: All I/O operations are async for scalability 4. **Stateless API**: Easy horizontal scaling ### Key Architectural Patterns #### 1. Provider Abstraction Pattern Both vector stores and LLM providers use abstract base classes to enable pluggability: - **Vector Store**: `VectorStore` (base) → `LocalVectorStore` (FAISS implementation) - Easy to add `PineconeVectorStore`, `WeaviateVectorStore`, etc. - All implement: `add_documents()`, `search()`, `delete_by_metadata()`, `get_stats()` - **LLM Providers**: `LLMProvider` (base) → `ClaudeProvider`, `OpenAIProvider`, `GeminiProvider` - Unified interface: `generate()` method returns standardized `LLMResponse` - Configuration via `LLMConfig` dataclass #### 2. Service Layer Architecture Three main services coordinate the RAG pipeline: - **IngestionService**: PDF → chunks → embeddings → vector store - Handles PDF parsing, text chunking with overlap, batch embedding generation - Chunking is sentence-boundary aware (breaks at `.` or `\n` when possible) - **RetrievalService**: Query → embeddings → semantic search → re-ranking → context - Two-stage retrieval: semantic search (FAISS) + lightweight keyword re-ranking - Re-ranking formula: `0.7 * semantic_score + 0.2 * keyword_overlap + 0.1 * term_density` - **RAGService**: Query + context → LLM → grounded answer - Enforces hallucination-averse system prompt - Supports multi-provider comparison (same context, different LLMs) #### 3. Hallucination Prevention Strategy The system is designed to prevent hallucinations through multiple mechanisms: - **System Prompt**: Explicitly instructs models to use ONLY provided context or respond with "No grounded steps found" - **Low Temperature**: Set to 0.1 (configurable) to minimize creativity - **Token Limit**: 500 tokens max (configurable) forces concise answers - **Citation Requirement**: Every answer must cite sources with `[Source N]` notation - **Context Validation**: Returns early with "No grounded steps found" if retrieval yields no results #### 4. Retrieval Pipeline The retrieval process follows this flow: 1. **Chunking** (Ingestion): - 512-character chunks with 50-character overlap (configurable) - Sentence-boundary aware splitting to avoid mid-sentence breaks - Metadata attached: device, page, chunk_index, source_file 2. **Semantic Search**: - Query embedded using `all-MiniLM-L6-v2` (configurable) - FAISS cosine similarity search - Optional device-based metadata filtering - Retrieves 2x requested chunks for re-ranking 3. **Re-ranking**: - Combines semantic score with keyword overlap and term density - Returns top-k chunks with highest combined scores 4. **Context Formatting**: - Formats retrieved chunks as: `[Source N] (Page X, Device):\n{text}` - Builds citation list with chunk IDs, pages, relevance scores ### Data Flow ``` Query → RetrievalService → VectorStore → [chunks] → RAGService → LLMProvider → Answer + Citations ``` For multi-provider comparison: ``` Query → RetrievalService (once) → [shared context] → RAGService → [Claude, OpenAI, Gemini] → [Answer1, Answer2, Answer3] ``` ### Configuration System Configuration uses Pydantic Settings with `.env` file support: - **API Keys**: `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GOOGLE_API_KEY` - **Vector Store**: `VECTOR_STORE_TYPE`, `VECTOR_STORE_PATH` - **Embeddings**: `EMBEDDING_MODEL` - **LLM**: `DEFAULT_LLM_PROVIDER`, `LLM_TEMPERATURE`, `LLM_MAX_TOKENS` - **Retrieval**: `TOP_K_CHUNKS`, `CHUNK_SIZE`, `CHUNK_OVERLAP` All settings are centralized in `app/config.py` and loaded via `Settings` class. ### Telemetry & Logging Structured JSON logging to `./logs/rag_YYYYMMDD.log` tracks: - **Query Events**: Query, device, provider, model, latency, tokens, citations, has_context - **Comparison Events**: Query, device, providers, latencies, tokens - **Ingestion Events**: Device, source_file, pages, chunks, duration - **Error Events**: Operation, error, details Access via global `telemetry` instance from `app.telemetry`. ## File Organization ``` app/ ├── api/ │ └── routes.py # FastAPI endpoints, request/response models ├── llm_providers/ │ ├── base.py # LLMProvider abstract base class │ ├── claude.py # Anthropic Claude implementation │ ├── openai_provider.py # OpenAI implementation │ └── gemini.py # Google Gemini implementation ├── services/ │ ├── ingestion.py # PDF → chunks → embeddings → vector store │ ├── retrieval.py # Query → semantic search → re-ranking │ └── rag.py # Retrieval + LLM generation orchestration ├── vector_store/ │ ├── base.py # VectorStore abstract base class │ └── local_store.py # FAISS-based local implementation ├── ui/ # Static HTML demo UI ├── config.py # Pydantic Settings for env-based config ├── telemetry.py # Structured JSON logging └── main.py # FastAPI app initialization ``` ## Adding New Components ### Adding a New Vector Store 1. Create `app/vector_store/your_store.py`: ```python from app.vector_store.base import VectorStore, VectorStoreConfig class YourVectorStore(VectorStore): async def add_documents(self, texts, embeddings, metadata): # Implementation pass async def search(self, query_embedding, top_k, filter_metadata): # Implementation pass # Implement other abstract methods ``` 2. Update `app/api/routes.py` → `get_vector_store()` to instantiate your store based on config 3. Update `.env`: `VECTOR_STORE_TYPE=your_store` ### Adding a New LLM Provider 1. Create `app/llm_providers/your_provider.py`: ```python from app.llm_providers.base import LLMProvider, LLMResponse class YourProvider(LLMProvider): async def generate(self, prompt, system_prompt): # Call your LLM API return LLMResponse( text=response_text, provider="your_provider", model="model_name", tokens_used=token_count, finish_reason="stop" ) def get_provider_name(self): return "your_provider" ``` 2. Update `app/api/routes.py` → `get_llm_provider()` to handle your provider name 3. Add API key to `.env`: `YOUR_PROVIDER_API_KEY=...` ## Important Implementation Details ### Embedding Model The system uses `all-MiniLM-L6-v2` by default (384 dimensions). If changing models: - Update `EMBEDDING_MODEL` in `.env` - Update `embedding_dimension` in `VectorStoreConfig` (if dimensions differ) - Re-ingest all documents (embeddings are not compatible across models) ### Authentication API endpoints (except health check) require `X-API-Key` header matching `API_KEY` from `.env`. Default for dev: `dev-key-123`. ### Device Management Devices are identified by name strings (e.g., "Bair Hugger Model 775"). Device filtering in queries: - Uses metadata filtering in vector store search - Exact match on `device` metadata field - Device names are case-sensitive ### Context Window The system retrieves `TOP_K_CHUNKS` (default: 5) chunks. Each chunk is ~512 characters. This provides ~2,500 characters of context to the LLM, well within all modern LLM context limits. ## Troubleshooting ### No Results Returned - Check if PDFs have been ingested: `GET /api/stats` (requires API key) - Verify device name matches exactly (case-sensitive) - Check `./logs/rag_YYYYMMDD.log` for errors ### Provider Errors - Verify API keys in `.env` - Check provider-specific rate limits - Use `/api/compare` to test all providers (gracefully handles missing keys) ### Vector Store Issues - Check `./data/vector_store/` exists and has write permissions - FAISS index is created on first ingestion - To reset: delete `./data/vector_store/` and re-run ingestion ## API Endpoint Summary All endpoints except health check require `X-API-Key` header. - `GET /api/` - Health check (no auth required) - `GET /api/stats` - Vector store statistics (total chunks, devices, embedding dimension) - `POST /api/query` - Query with single provider (body: `{query, device?, provider?}`) - `POST /api/compare` - Compare all available providers (body: `{query, device?}`) - `POST /api/ingest` - Ingest PDF from `data/` directory (body: `{device_name, filename}`) - `POST /api/export` - Export answer as JSON for ServiceNow/Jira integration See http://localhost:8000/docs for interactive API documentation.

Related Documents

University of Guelph Rocketry Club - Complete Tech Stack

Reactory Data -- Agent Context

Frontend Development Rules

TypeScript CLI AI Conversation App - Technical Plan