Loading...
Loading...
Loading...
**Problem**: Risk of memory leaks with unlimited message storage
# π Development Notes
## π€ AI-Generated vs Manual Code
### AI-Generat### 5. **Memory Management**
**Problem**: Risk of memory leaks with unlimited message storage
**Solution**: Implemented automatic message cleanup:
```typescript
// Keep only last 10 messages per session
if (session.messages.length > 10) {
session.messages = session.messages.slice(-10);
}
```
### 6. **Pinecone Integration Issues**
**Problem**: Dimension mismatch between Google GenAI embeddings and Pinecone index
```typescript
// β This failed - wrong dimensions
await index.upsert([{ id, values: embedding }]); // 768 dimensions
// β
Fixed with proper embedding model
const embedding = await genAI.getGenerativeModel({ model: 'text-embedding-004' })
.embedContent(text); // 3072 dimensions
```
**Solution**: Used text-embedding-004 model and properly configured Pinecone index with 3072 dimensions.
### 7. **Gemini API Overload Handling**
**Problem**: Models returning 503 errors during high usage periods, fallback responses were confusing
```typescript
// β Generated fallback responses when APIs failed
return "I apologize, but I'm experiencing technical difficulties...";
// β
Return proper HTTP error codes
return res.status(503).json({
error: "Service temporarily unavailable",
message: "All AI models are currently overloaded. Please try again later.",
code: 503
});
```
**Solution**: Implemented multi-model retry with exponential backoff and clean HTTP error responses.ents (with GitHub Copilot assistance):
- **Initial project structure** - Used Copilot for boilerplate Express setup
- **TypeScript interfaces** (`src/types/index.ts`) - AI suggested comprehensive type definitions
- **Plugin templates** - Basic plugin structure and math evaluation logic
- **README sections** - Parts of documentation and example curl commands
- **Error handling patterns** - Standard Express error handling structures
### Manual Implementation (Human-written):
- **Core architecture design** - System design and component interaction patterns
- **Vector store algorithm** - Custom cosine similarity implementation
- **Memory management strategy** - Session-based conversation memory logic
- **Prompt engineering** - All system prompts and context injection logic
- **Plugin intent detection** - Custom keyword matching and query analysis
- **Agent orchestration flow** - Complete integration of RAG + Memory + Plugins
- **Business logic** - All controllers and service coordination
### Mixed Approach:
- **Vector search implementation** - Started with AI template, heavily customized for our needs
- **Plugin execution system** - AI structure, manual business logic
- **API responses** - AI suggested JSON structure, manual error handling and metadata
## π Bugs Faced and Solutions
### 1. Google GenAI API Issues
**Problem**: `getGenerativeModel` method not found, incorrect API initialization
```typescript
// β This didn't work
const genAI = new GoogleGenAI(apiKey);
// β
Fixed with proper options object
const genAI = new GoogleGenAI({ apiKey });
```
**Solution**: Read the official Google GenAI documentation and corrected the initialization pattern.
### 2. TypeScript Strict Mode Conflicts
**Problem**: `exactOptionalPropertyTypes` causing issues with conditional properties
```typescript
// β This failed type checking
pluginsUsed: pluginsUsed.length > 0 ? pluginsUsed : undefined
// β
Fixed with object spread
...(pluginsUsed.length > 0 && { pluginsUsed })
```
**Solution**: Used object spread syntax to conditionally include optional properties.
### 3. Vector Embedding Complexity
**Problem**: Initially tried to use `ml-distance` package but it had TypeScript issues
**Solution**: Implemented custom cosine similarity function with proper null checks:
```typescript
function cosineSimilarity(a: number[], b: number[]): number {
// Custom implementation with safety checks
for (let i = 0; i < a.length; i++) {
const aVal = a[i] || 0; // Handle undefined values
const bVal = b[i] || 0;
}
}
```
### 4. File System Access Issues
**Problem**: Array indexing with potential undefined values in file reading
```typescript
// β Potential undefined access
const chunk = chunks[i];
await this.addDocument(chunks[i], metadata);
// β
Added safety check
const chunk = chunks[i];
if (chunk) {
await this.addDocument(chunk, metadata);
}
```
**Solution**: Added explicit undefined checks before processing chunks.
### 5. Memory Management
**Problem**: Risk of memory leaks with unlimited message storage
**Solution**: Implemented automatic message cleanup:
```typescript
// Keep only last 10 messages per session
if (session.messages.length > 10) {
session.messages = session.messages.slice(-10);
}
```
## π Agent Flow: RAG + Memory + Plugins
### Complete Request Processing Flow:
1. **Input Validation**
- Check required fields (message, session_id)
- Validate API key availability
2. **Memory Retrieval**
```typescript
// Add current message to session
memoryManager.addMessage(session_id, message, 'user');
// Get conversation context
const sessionSummary = memoryManager.getSessionSummary(session_id);
```
3. **RAG Context Retrieval**
```typescript
// Vector search for relevant documents
const relevantContext = await vectorStore.search(message, 3);
// Truncate contexts to fit prompt limits
const contextDocs = relevantContext.map(doc => ({
...doc,
content: truncateContext(doc.content, 300)
}));
```
4. **Plugin Execution**
```typescript
// Check which plugins can handle the query
const pluginResults = await pluginManager.executePlugins(message);
// Filter successful results
const pluginsUsed = pluginResults
.filter(result => result.success)
.map(result => result.pluginName);
```
5. **Prompt Engineering**
```typescript
// Build comprehensive prompt with all context
const prompt = PromptEngine.buildPrompt(
message, // Current user message
sessionSummary, // Conversation history
contextDocs, // RAG retrieved documents
pluginResults // Plugin execution results
);
```
6. **LLM Generation**
```typescript
// Generate response using Google Gemini
const result = await model.generateContent(prompt);
const aiResponse = result.response.text();
// Clean and format response
const formattedResponse = PromptEngine.formatResponse(aiResponse);
```
7. **Memory Storage**
```typescript
// Store AI response for future context
memoryManager.addMessage(session_id, formattedResponse, 'assistant');
```
8. **Response Assembly**
```typescript
// Include metadata about what was used
const response = {
response: formattedResponse,
sessionId: session_id,
...(pluginsUsed.length > 0 && { pluginsUsed }),
...(contextUsed.length > 0 && { contextUsed })
};
```
## π― Key Design Decisions
### 1. **Pinecone Vector Database over Custom Implementation**
- **Decision**: Use Pinecone cloud vector database with Google GenAI embeddings
- **Reason**: Production-ready scaling, 3072-dimensional embeddings, better similarity search
- **Trade-off**: External dependency, requires API key, but much more robust than in-memory
### 2. **Multi-Model Fallback Strategy**
- **Decision**: Implement fallback chain: gemini-1.5-flash β gemini-pro β gemini-1.5-pro
- **Reason**: Reliability when models are overloaded, different quota limits
- **Implementation**: Exponential backoff retry with clean HTTP error codes
### 3. **Plugin Architecture with Real APIs**
- **Decision**: Interface-based plugin system with real API integrations
- **Reason**: Extensible, testable, production-ready data sources
- **Implementation**: OpenWeather API for weather, mathjs for calculations, realistic fallbacks
### 4. **Prompt Engineering Approach**
- **Decision**: Custom prompt builder with structured context injection
- **Reason**: Full control over how context, memory, and plugins are presented to the LLM
- **Structure**: System prompt + Memory + Context + Plugins + Current message
### 5. **Clean Error Handling over Fallback Responses**
- **Decision**: Return proper HTTP status codes (503, 500) instead of generated fallback content
- **Reason**: Professional API behavior, clear error communication, better debugging
- **Implementation**: 503 for overloaded models, 500 for configuration errors
## β‘ Performance Considerations
### Current Optimizations:
- Pinecone vector database for efficient similarity search
- Google GenAI embeddings (3072 dimensions) for better semantic understanding
- Session-based memory with automatic cleanup (10 messages max)
- Multi-model retry logic with exponential backoff
- Proper HTTP status codes for better client handling
### Production Features:
- Real OpenWeather API integration with realistic fallbacks
- Comprehensive logging and error tracking
- Clean separation of concerns with proper TypeScript interfaces
- Environment-based configuration with validation
## π Recent Improvements & Production Readiness
### Latest Updates (August 2025):
#### 1. **Pinecone Vector Database Integration**
- Migrated from in-memory vector storage to production Pinecone
- Auto-index creation with 3072-dimensional Google GenAI embeddings
- Successfully loaded 26 documents with proper chunking
- Eliminated O(n) search limitations with proper vector database
#### 2. **Multi-Model Fallback System**
- Implemented robust retry logic: gemini-1.5-flash β gemini-pro β gemini-1.5-pro
- Exponential backoff with 2s, 4s delays between retries
- Comprehensive error handling for 503 (overloaded), 429 (quota), 404 (not found)
- Detailed logging with emoji indicators for better debugging
#### 3. **OpenWeather API Integration**
- Real weather data integration with proper API key handling
- Realistic city-based fallback simulation when API key missing
- Professional error handling with graceful degradation
- Comprehensive weather data including temperature, humidity, wind speed
#### 4. **Clean Error Handling**
- Removed fallback response generation in favor of proper HTTP codes
- 503 "Service temporarily unavailable" when all models overloaded
- 500 "Internal server error" for missing API keys
- Professional API behavior suitable for production clients
#### 5. **Enhanced Logging & Monitoring**
- Detailed request tracing with emoji indicators
- Plugin execution tracking with success/failure states
- Model retry attempts with clear failure reasons
- Session and memory usage statistics
### System Status:
- β
**Production Ready**: Clean error codes, proper fallbacks, comprehensive logging
- β
**Scalable**: Pinecone integration handles large document sets efficiently
- β
**Reliable**: Multi-model fallback ensures high availability
- β
**Extensible**: Plugin system ready for additional integrations
- β
**Maintainable**: Clear separation of concerns, comprehensive TypeScript types
## π§ͺ Testing Strategy
### Manual Testing Approach:
1. **Basic Agent**: Test simple Q&A without context
2. **Memory**: Multiple messages in same session
3. **RAG**: Questions about dataset content
4. **Weather Plugin**: Various weather queries
5. **Math Plugin**: Different mathematical expressions
6. **Combined**: Complex queries triggering multiple systems
### Example Test Scenarios:
```bash
# Memory persistence test
curl -X POST http://localhost:3001/agent/message -d '{"message":"My name is John","session_id":"test1"}'
curl -X POST http://localhost:3001/agent/message -d '{"message":"What is my name?","session_id":"test1"}'
# Multi-plugin execution test
curl -X POST http://localhost:3001/agent/message -d '{"message":"Weather in Mumbai and calculate 5*7","session_id":"test2"}'
# RAG context test
curl -X POST http://localhost:3001/agent/message -d '{"message":"Tell me about markdown blogging","session_id":"test3"}'
# Error handling test (missing message)
curl -X POST http://localhost:3001/agent/message -d '{"session_id":"test4"}'
```
## π Deployment Readiness
### Environment Variables Required:
- `GEMINI_API` - **Essential** - Google Gemini AI API key for LLM functionality
- `PINECONE_API` - **Essential** - Pinecone vector database API key for RAG
- `OPEN_WEATHER_API` - *Optional* - OpenWeather API for real weather data
- `PORT` - For platform deployment (Render, Railway, etc.)
### Production Considerations:
- β
**Rate limiting** - Handled by Gemini API and Pinecone quotas
- β
**Proper logging** - Comprehensive emoji-based logging implemented
- β
**Health checks** - `/status` endpoint with detailed system information
- β
**Error handling** - Clean HTTP status codes and proper error messages
- β
**API authentication** - Individual API key management per service
- [ ] **Database persistence** - Currently using session-based memory (consider Redis)
- [ ] **Error monitoring** - Consider Sentry for production deployments
The system is **production-ready** with proper error handling, multi-model fallback, and comprehensive logging. All major functionality is implemented and tested successfully.
---
**Total Development Time**: ~8-10 hours (including architecture, implementation, debugging, API integrations, and comprehensive documentation)
**Current Status**: Production-ready with 26 vectorized documents, multi-model AI fallback, real weather API integration, and professional error handling
> Design document analyzing how user actions feed back into ML predictions,
This document provides a complete reference for all exported APIs in the go-attention library.
This document captures important learnings and best practices discovered while building and maintaining the Papr Memory Python SDK, specifically around on-device processing and Core ML integration.
Tensor factorization is a method for decomposing tensors, which are described in [Section @sec:loading-rescal], into lower-rank approximations.