Loading...
Loading...
This file provides guidance to WARP (warp.dev) when working with code in this repository.
# WARP.md
This file provides guidance to WARP (warp.dev) when working with code in this repository.
## Project Overview
RAG Troubleshooter is a production-ready RAG service for medical device troubleshooting with vendor-neutral LLM support. It provides hallucination-safe, explainable troubleshooting guidance by ingesting service manual PDFs into a vector database and generating grounded answers via Claude, OpenAI, or Gemini.
## Development Commands
### Environment Setup
```powershell
# Install dependencies
poetry install
# Or with pip
pip install -e .
# Configure environment variables
Copy-Item .env.example .env
# Then edit .env with your API keys
```
### Running the Server
```powershell
# Run with Poetry (recommended for development with hot reload)
poetry run python app/main.py
# Or with uvicorn directly
poetry run uvicorn app.main:app --reload
# Production mode (no reload)
poetry run uvicorn app.main:app --host 0.0.0.0 --port 8000
```
Server endpoints:
- Demo UI: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Health Check: http://localhost:8000/api/
### Document Ingestion
```powershell
# Run the ingestion script to load device manuals
poetry run python ingest_devices.py
# Or via API (requires running server)
curl -X POST http://localhost:8000/api/ingest `
-H "Content-Type: application/json" `
-H "X-API-Key: dev-key-123" `
-d '{"device_name": "Bair Hugger Model 775", "filename": "bair_hugger_manual.pdf"}'
```
### Testing
```powershell
# Run tests (when implemented)
poetry run pytest
# Run tests with coverage
poetry run pytest --cov=app --cov-report=html
```
### Code Quality
```powershell
# Format code (line length: 100)
poetry run black app/
# Lint code
poetry run ruff app/
# Type check
poetry run mypy app/
```
## Architecture Overview
### Core Design Principles
1. **Pluggable Components**: Abstract base classes enable easy swapping of vector stores and LLM providers
2. **Hallucination Prevention**: Strict prompts, low temperature (0.1), required citations, "No grounded steps found" fallback
3. **Async Throughout**: All I/O operations are async for scalability
4. **Stateless API**: Easy horizontal scaling
### Key Architectural Patterns
#### 1. Provider Abstraction Pattern
Both vector stores and LLM providers use abstract base classes to enable pluggability:
- **Vector Store**: `VectorStore` (base) → `LocalVectorStore` (FAISS implementation)
- Easy to add `PineconeVectorStore`, `WeaviateVectorStore`, etc.
- All implement: `add_documents()`, `search()`, `delete_by_metadata()`, `get_stats()`
- **LLM Providers**: `LLMProvider` (base) → `ClaudeProvider`, `OpenAIProvider`, `GeminiProvider`
- Unified interface: `generate()` method returns standardized `LLMResponse`
- Configuration via `LLMConfig` dataclass
#### 2. Service Layer Architecture
Three main services coordinate the RAG pipeline:
- **IngestionService**: PDF → chunks → embeddings → vector store
- Handles PDF parsing, text chunking with overlap, batch embedding generation
- Chunking is sentence-boundary aware (breaks at `.` or `\n` when possible)
- **RetrievalService**: Query → embeddings → semantic search → re-ranking → context
- Two-stage retrieval: semantic search (FAISS) + lightweight keyword re-ranking
- Re-ranking formula: `0.7 * semantic_score + 0.2 * keyword_overlap + 0.1 * term_density`
- **RAGService**: Query + context → LLM → grounded answer
- Enforces hallucination-averse system prompt
- Supports multi-provider comparison (same context, different LLMs)
#### 3. Hallucination Prevention Strategy
The system is designed to prevent hallucinations through multiple mechanisms:
- **System Prompt**: Explicitly instructs models to use ONLY provided context or respond with "No grounded steps found"
- **Low Temperature**: Set to 0.1 (configurable) to minimize creativity
- **Token Limit**: 500 tokens max (configurable) forces concise answers
- **Citation Requirement**: Every answer must cite sources with `[Source N]` notation
- **Context Validation**: Returns early with "No grounded steps found" if retrieval yields no results
#### 4. Retrieval Pipeline
The retrieval process follows this flow:
1. **Chunking** (Ingestion):
- 512-character chunks with 50-character overlap (configurable)
- Sentence-boundary aware splitting to avoid mid-sentence breaks
- Metadata attached: device, page, chunk_index, source_file
2. **Semantic Search**:
- Query embedded using `all-MiniLM-L6-v2` (configurable)
- FAISS cosine similarity search
- Optional device-based metadata filtering
- Retrieves 2x requested chunks for re-ranking
3. **Re-ranking**:
- Combines semantic score with keyword overlap and term density
- Returns top-k chunks with highest combined scores
4. **Context Formatting**:
- Formats retrieved chunks as: `[Source N] (Page X, Device):\n{text}`
- Builds citation list with chunk IDs, pages, relevance scores
### Data Flow
```
Query → RetrievalService → VectorStore → [chunks]
→ RAGService → LLMProvider → Answer + Citations
```
For multi-provider comparison:
```
Query → RetrievalService (once) → [shared context]
→ RAGService → [Claude, OpenAI, Gemini] → [Answer1, Answer2, Answer3]
```
### Configuration System
Configuration uses Pydantic Settings with `.env` file support:
- **API Keys**: `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`, `GOOGLE_API_KEY`
- **Vector Store**: `VECTOR_STORE_TYPE`, `VECTOR_STORE_PATH`
- **Embeddings**: `EMBEDDING_MODEL`
- **LLM**: `DEFAULT_LLM_PROVIDER`, `LLM_TEMPERATURE`, `LLM_MAX_TOKENS`
- **Retrieval**: `TOP_K_CHUNKS`, `CHUNK_SIZE`, `CHUNK_OVERLAP`
All settings are centralized in `app/config.py` and loaded via `Settings` class.
### Telemetry & Logging
Structured JSON logging to `./logs/rag_YYYYMMDD.log` tracks:
- **Query Events**: Query, device, provider, model, latency, tokens, citations, has_context
- **Comparison Events**: Query, device, providers, latencies, tokens
- **Ingestion Events**: Device, source_file, pages, chunks, duration
- **Error Events**: Operation, error, details
Access via global `telemetry` instance from `app.telemetry`.
## File Organization
```
app/
├── api/
│ └── routes.py # FastAPI endpoints, request/response models
├── llm_providers/
│ ├── base.py # LLMProvider abstract base class
│ ├── claude.py # Anthropic Claude implementation
│ ├── openai_provider.py # OpenAI implementation
│ └── gemini.py # Google Gemini implementation
├── services/
│ ├── ingestion.py # PDF → chunks → embeddings → vector store
│ ├── retrieval.py # Query → semantic search → re-ranking
│ └── rag.py # Retrieval + LLM generation orchestration
├── vector_store/
│ ├── base.py # VectorStore abstract base class
│ └── local_store.py # FAISS-based local implementation
├── ui/ # Static HTML demo UI
├── config.py # Pydantic Settings for env-based config
├── telemetry.py # Structured JSON logging
└── main.py # FastAPI app initialization
```
## Adding New Components
### Adding a New Vector Store
1. Create `app/vector_store/your_store.py`:
```python
from app.vector_store.base import VectorStore, VectorStoreConfig
class YourVectorStore(VectorStore):
async def add_documents(self, texts, embeddings, metadata):
# Implementation
pass
async def search(self, query_embedding, top_k, filter_metadata):
# Implementation
pass
# Implement other abstract methods
```
2. Update `app/api/routes.py` → `get_vector_store()` to instantiate your store based on config
3. Update `.env`: `VECTOR_STORE_TYPE=your_store`
### Adding a New LLM Provider
1. Create `app/llm_providers/your_provider.py`:
```python
from app.llm_providers.base import LLMProvider, LLMResponse
class YourProvider(LLMProvider):
async def generate(self, prompt, system_prompt):
# Call your LLM API
return LLMResponse(
text=response_text,
provider="your_provider",
model="model_name",
tokens_used=token_count,
finish_reason="stop"
)
def get_provider_name(self):
return "your_provider"
```
2. Update `app/api/routes.py` → `get_llm_provider()` to handle your provider name
3. Add API key to `.env`: `YOUR_PROVIDER_API_KEY=...`
## Important Implementation Details
### Embedding Model
The system uses `all-MiniLM-L6-v2` by default (384 dimensions). If changing models:
- Update `EMBEDDING_MODEL` in `.env`
- Update `embedding_dimension` in `VectorStoreConfig` (if dimensions differ)
- Re-ingest all documents (embeddings are not compatible across models)
### Authentication
API endpoints (except health check) require `X-API-Key` header matching `API_KEY` from `.env`. Default for dev: `dev-key-123`.
### Device Management
Devices are identified by name strings (e.g., "Bair Hugger Model 775"). Device filtering in queries:
- Uses metadata filtering in vector store search
- Exact match on `device` metadata field
- Device names are case-sensitive
### Context Window
The system retrieves `TOP_K_CHUNKS` (default: 5) chunks. Each chunk is ~512 characters. This provides ~2,500 characters of context to the LLM, well within all modern LLM context limits.
## Troubleshooting
### No Results Returned
- Check if PDFs have been ingested: `GET /api/stats` (requires API key)
- Verify device name matches exactly (case-sensitive)
- Check `./logs/rag_YYYYMMDD.log` for errors
### Provider Errors
- Verify API keys in `.env`
- Check provider-specific rate limits
- Use `/api/compare` to test all providers (gracefully handles missing keys)
### Vector Store Issues
- Check `./data/vector_store/` exists and has write permissions
- FAISS index is created on first ingestion
- To reset: delete `./data/vector_store/` and re-run ingestion
## API Endpoint Summary
All endpoints except health check require `X-API-Key` header.
- `GET /api/` - Health check (no auth required)
- `GET /api/stats` - Vector store statistics (total chunks, devices, embedding dimension)
- `POST /api/query` - Query with single provider (body: `{query, device?, provider?}`)
- `POST /api/compare` - Compare all available providers (body: `{query, device?}`)
- `POST /api/ingest` - Ingest PDF from `data/` directory (body: `{device_name, filename}`)
- `POST /api/export` - Export answer as JSON for ServiceNow/Jira integration
See http://localhost:8000/docs for interactive API documentation.
Full-stack web application for the University of Guelph Rocketry Club featuring AI-powered chatbot, member management, project showcases, and sponsor integration.
Reactory Data (`reactory-data`) is the data, assets, and CDN repository for the Reactory platform. It provides baseline directory structures, fonts, themes, internationalization files, client plugin source code and runtime bundles, email templates, workflow schedules, database backups, AI learning resources, and static content.
globs: src/app/**/*.tsx src/components/**/*.tsx src/hooks/**/*.ts src/lib/**/*.ts
A TypeScript CLI application that initiates and maintains an autonomous conversation between two AI personas using Ollama. The app starts with user input and then continues the conversation automatically until stopped.