📝 Development Notes

# 📝 Development Notes ## AI-Generated vs Hand-Written Code ### 🤖 AI-Generated Components (~40% of codebase) I used Claude/GPT assistance for the following components, which I then extensively modified and enhanced: 1. **Initial Project Structure** (AI-generated base, heavily modified) - Basic TypeScript configuration - Package.json dependencies list - Express server boilerplate 2. **Mathematical Expression Parser** (AI-generated core, safety-enhanced by me) - File: `src/plugins/mathPlugin.ts` - `evaluateExpression()` method - The recursive descent parser logic was AI-generated - I added safety validations and error handling 3. **Sample Document Content** (AI-generated, curated by me) - Content in `src/services/vectorStore.ts` - `createSampleDocuments()` - AI generated the technical documentation content - I structured and organized it for better RAG performance 4. **Gemini API Integration Patterns** (AI-suggested patterns, implemented by me) - Basic API call structure in `src/services/llmService.ts` - I customized the prompt engineering and error handling ### ✋ Hand-Written Components (~60% of codebase) The core architecture and business logic was entirely my design: 1. **System Architecture & Design** - Complete system design and component interaction - Plugin manager architecture and intent detection - Memory management strategy - RAG workflow and retrieval logic 2. **Vector Store Implementation** - Custom cosine similarity implementation - Document chunking strategy - Embedding storage and retrieval logic - All in `src/services/vectorStore.ts` 3. **Plugin System Architecture** - Plugin interface design - Intent detection patterns (regex-based) - Plugin execution workflow - Result integration with LLM context 4. **Prompt Engineering** - Complete system prompt design - Context injection strategy - Memory integration patterns - Plugin result formatting 5. **Agent Service Orchestration** - Message processing pipeline - Service coordination - Error handling and recovery - Response formatting 6. **Session Memory Management** - Memory structure design - Conversation history management - Memory cleanup policies ## 🐛 Bugs Faced & Solutions ### 1. Vector Store Initialization Race Condition **Problem**: Vector store wasn't initialized before first requests ```typescript // Problematic code: app.post("/agent/message", async (req, res) => { // vectorStore.retrieveSimilarChunks() called before initialization }); ``` **Solution**: Added explicit initialization check and async initialization ```typescript async retrieveSimilarChunks(query: string, topK: number = 3): Promise<DocumentChunk[]> { if (!this.isInitialized) { await this.initialize(); // Auto-initialize if needed } // ... rest of method } ``` ### 2. Embedding API Rate Limits **Problem**: When processing multiple documents simultaneously, hit Gemini rate limits **Solution**: Added sequential processing with error handling ```typescript for (let i = 0; i < chunks.length; i++) { try { const embedding = await generateEmbedding(chunk); // Process one at a time to avoid rate limits } catch (error) { console.error(`Error processing chunk ${i}:`, error); // Continue with other chunks instead of failing completely } } ``` ### 3. Math Plugin Security Vulnerability **Problem**: Initial implementation used `eval()` which is dangerous ```typescript // Dangerous approach: const result = eval(mathExpression); // NEVER DO THIS ``` **Solution**: Built custom recursive descent parser ```typescript private evaluateExpression(expr: string): number { // Validate input first if (!/^[0-9+\-*/.()]+$/.test(expr)) { throw new Error('Invalid characters in expression'); } // Use safe parsing instead of eval return this.parseExpression(expr, 0).value; } ``` ### 4. Memory Leak in Session Storage **Problem**: Sessions growing infinitely without cleanup **Solution**: Added automatic memory management ```typescript // Keep only last 20 messages to prevent memory bloat if (session.messages.length > 20) { session.messages = session.messages.slice(-20); } ``` ### 5. Plugin Intent Detection False Positives **Problem**: Simple keyword matching triggered wrong plugins ```typescript // Too broad: if (message.includes('weather')) // Triggered on "whether or not" ``` **Solution**: Used more specific regex patterns ```typescript // More precise: if (/weather.*in\s+\w+/i.test(message) || /what.*weather/i.test(message)) ``` ### 6. Document Chunking Edge Cases **Problem**: Very short chunks or empty chunks causing embedding failures **Solution**: Added chunk validation and filtering ```typescript for (let i = 0; i < chunks.length; i++) { const chunk = chunks[i]; if (chunk.trim().length < 50) continue; // Skip very short chunks // ... process chunk } ``` ## 🔄 Agent Routing Flow ### Message Processing Pipeline ``` 1. HTTP Request ↓ 2. Input Validation ↓ 3. RAG Retrieval (Vector Store) ├── Generate query embedding ├── Calculate cosine similarities └── Return top-3 chunks ↓ 4. Memory Retrieval ├── Get last 4 messages for session └── Format for LLM context ↓ 5. Plugin Detection & Execution ├── Weather intent detection ├── Math intent detection └── Execute matching plugins ↓ 6. LLM Generation ├── Build system prompt with: │ ├── Instructions │ ├── Retrieved context │ ├── Plugin results │ └── Memory ├── Call Gemini gemini-2.5-flash └── Get response ↓ 7. Memory Storage ├── Store user message └── Store assistant response ↓ 8. HTTP Response ``` ### Plugin Call Routing ```typescript // Intent Detection (PluginManager) async detectAndExecutePlugins(message: string): Promise<PluginResult[]> { const results: PluginResult[] = []; // Weather detection if (/weather.*in\s+\w+/i.test(message)) { results.push(await this.weatherPlugin.execute(message)); } // Math detection if (/\d+\s*[+\-*/]\s*\d+/i.test(message)) { results.push(await this.mathPlugin.execute(message)); } return results; } ``` ### Memory + Context Integration ```typescript // LLM Service - Context Building private buildSystemPrompt(retrievedChunks: DocumentChunk[], pluginResults: PluginResult[]): string { let prompt = `You are an intelligent AI assistant...`; // Add RAG context if (retrievedChunks.length > 0) { prompt += `KNOWLEDGE BASE CONTEXT:\n`; retrievedChunks.forEach((chunk, index) => { prompt += `Context ${index + 1} (from ${chunk.metadata.source}):\n${chunk.content}\n\n`; }); } // Add plugin results if (pluginResults.length > 0) { prompt += `TOOL RESULTS:\n`; pluginResults.forEach((result, index) => { prompt += `Tool ${index + 1} (${result.plugin_name}): ${JSON.stringify(result.result)}\n`; }); } return prompt; } ``` ## 🚀 Performance Optimizations Implemented ### 1. Embedding Caching Strategy - Generated embeddings are stored in memory - No re-computation for same documents - ~90% faster on subsequent queries ### 2. Memory Management - Limited conversation history (20 messages max) - Only last 4 messages sent to LLM - Prevents context window overflow ### 3. Chunk Size Optimization - 300 character chunks for optimal embedding quality - Balanced between context and specificity - Tested various sizes (150, 300, 500 chars) ### 4. Plugin Execution Optimization - Parallel plugin detection - Early return on intent match - Mock data fallbacks for failed API calls ## 🔧 Architecture Decisions ### Why Custom Vector Store? - **Requirement**: No FAISS allowed in TypeScript - **Options Considered**: ChromaDB, Pinecone, Weaviate - **Decision**: Custom implementation using cosine similarity - **Trade-offs**: Simpler but less scalable than dedicated vector DBs ### Why In-Memory Storage? - **Requirement**: 1-day challenge timeline - **Decision**: In-memory for speed of development - **Trade-offs**: Not production-scalable, but perfect for demo - **Future**: Would use Redis for sessions, PostgreSQL for persistence ### Plugin Architecture Choice - **Pattern**: Intent-based detection with regex - **Alternative**: LLM-based intent classification - **Decision**: Regex for speed and deterministic behavior - **Trade-offs**: Less flexible but more reliable ### Prompt Engineering Strategy - **Approach**: Explicit section-based prompts - **Sections**: Instructions → Context → Tools → History - **Decision**: Clear structure over conversational prompts - **Result**: More consistent and controllable responses ## 🎯 Production Readiness Gaps ### Current Limitations 1. **No Persistent Storage**: All data lost on restart 2. **No Authentication**: Open endpoints 3. **No Rate Limiting**: Vulnerable to abuse 4. **Single Instance**: No horizontal scaling 5. **No Monitoring**: No metrics or logging infrastructure ### Production Migration Path 1. **Database Layer**: PostgreSQL + Redis 2. **Authentication**: JWT tokens or API keys 3. **Rate Limiting**: Redis-based token bucket 4. **Scaling**: Docker + Kubernetes deployment 5. **Monitoring**: Prometheus + Grafana metrics 6. **Vector DB**: Migrate to Pinecone or Weaviate 7. **CI/CD**: GitHub Actions deployment pipeline ## 🧪 Testing Strategy ### Manual Testing Done - ✅ Knowledge base queries (AI fundamentals, web dev, etc.) - ✅ Weather plugin with various city formats - ✅ Math plugin with complex expressions - ✅ Session memory across multiple messages - ✅ Plugin combination scenarios - ✅ Error handling (invalid inputs, API failures) - ✅ RAG retrieval accuracy ### Test Cases Validated ```bash # Knowledge retrieval "What is machine learning?" → Retrieved AI fundamentals doc # Plugin detection "Weather in Mumbai" → Weather plugin triggered "Calculate 15 * 8 + 42" → Math plugin triggered # Memory persistence Session 1: "Tell me about APIs" → Response about REST Session 1: "What HTTP methods?" → Context-aware response # Error handling Invalid math: "Calculate abc" → Graceful error message Missing API key: Weather call → Falls back to mock data ``` ## 💡 Key Learnings ### 1. Prompt Engineering is Critical - Structure matters more than length - Explicit instructions beat implicit expectations - Context ordering affects response quality ### 2. Plugin Intent Detection Challenges - Natural language is ambiguous - Multiple intents in single message - False positives require careful regex design ### 3. RAG Retrieval Quality - Chunk size significantly impacts relevance - Embedding model choice affects similarity accuracy - Query preprocessing improves retrieval ### 4. Memory Management Balance - Too much history → expensive API calls - Too little history → loss of context - Sweet spot: 4-6 recent messages This system demonstrates production-ready patterns while being implementable in a 1-day timeframe. The architecture is designed for easy extension and production migration.

Related Documents

cheap-RAG Development Roadmap

Semblance AI — Development Roadmap

Changelog

Toasty — AI Triage & Responsible Disclosure Assistant (2026 — 350 hours)