ReachInbox Email Aggregator - Development Plan — .md Directory

# ReachInbox Email Aggregator - Development Plan ## Project Overview Build an AI-driven email onebox aggregator with real-time IMAP synchronization, intelligent categorization, and RAG-powered reply suggestions using TypeScript and Node.js. --- ## Phase 0: Project Setup ### 1. Initialize Project ```bash npm init -y npm install typescript @types/node ts-node --save-dev npx tsc --init ``` ### 2. Install Core Dependencies ```bash # IMAP Client npm install node-imap mailparser # Database & Search npm install @elastic/elasticsearch @qdrant/js-client-rest # Server & Utilities npm install express dotenv cors npm install @types/express @types/node-imap --save-dev # AI Integration # Use native fetch for Gemini API ``` ### 3. Docker Services Setup Create `docker-compose.yml`: - **Elasticsearch** (port 9200) - Email search and storage - **Qdrant** (ports 6333, 6334) - Vector database for RAG ### 4. Project Structure ``` src/ ├── services/ │ ├── imap/ # Email synchronization │ ├── elasticsearch/ # Search & indexing │ ├── gemini/ # AI categorization & embeddings │ └── vector-db/ # RAG implementation ├── controllers/ # API handlers ├── routes/ # Express routes ├── types/ # TypeScript interfaces └── utils/ # Helper functions ``` ### 5. Environment Configuration Create `.env.example`: ``` # IMAP Accounts (minimum 2) IMAP_EMAIL_1= IMAP_PASSWORD_1= IMAP_EMAIL_2= IMAP_PASSWORD_2= # Elasticsearch ELASTICSEARCH_HOST=localhost:9200 # AI GEMINI_API_KEY= # Integrations SLACK_WEBHOOK_URL= WEBHOOK_SITE_URL= # Vector DB QDRANT_HOST=localhost QDRANT_PORT=6333 ``` --- ## Phase 1: Real-Time Email Synchronization ⭐ MANDATORY ### Core Requirements - Persistent IMAP connections using IDLE mode - No polling or cron jobs - Sync last 30 days of emails - Support minimum 2 email accounts ### Implementation Steps #### 1.1 IMAP Service Setup Create `src/services/imap/ImapClient.ts`: - Initialize connection per email account - Authenticate using credentials from environment #### 1.2 Initial History Fetch - Fetch all major folders (INBOX, Sent, Drafts) - Use IMAP SEARCH: `SINCE <30 days ago>` - Fetch ENVELOPE and BODYSTRUCTURE - Parse email metadata (from, to, subject, date) - Extract plain text body #### 1.3 Real-Time IDLE Listener - Transition to IDLE mode after initial sync - Listen for events: - `mail` - New email arrival - `expunge` - Email deletion - Fetch new emails immediately on event - Pass to indexing pipeline #### 1.4 Connection Watchdog - Ping connection every 29 minutes - Re-send IDLE command to maintain session - Handle disconnections with auto-reconnect - Implement exponential backoff on failures #### 1.5 Data Structure ```typescript interface EmailDocument { id: string; accountId: string; folder: string; subject: string; body: string; from: string; to: string[]; date: Date; aiCategory: string; indexedAt: Date; } ``` --- ## Phase 2: Elasticsearch Storage & Search ⭐ CORE ### Implementation Steps #### 2.1 Elasticsearch Client Setup Create `src/services/elasticsearch/ElasticsearchClient.ts`: - Connect to `http://localhost:9200` - Create `emails` index with proper mapping #### 2.2 Index Mapping ```typescript { subject: { type: 'text' }, body: { type: 'text' }, accountId: { type: 'keyword' }, folder: { type: 'keyword' }, date: { type: 'date' }, aiCategory: { type: 'keyword' }, from: { type: 'keyword' }, to: { type: 'keyword' } } ``` #### 2.3 Indexing Pipeline - Receive email from IMAP service - Strip HTML, convert to plain text - Index immediately to Elasticsearch - Return indexed document ID #### 2.4 Search Implementation Create search endpoint: `GET /api/emails/search` Query parameters: - `q` - Search term (subject/body) - `account` - Filter by accountId - `folder` - Filter by folder - `category` - Filter by AI category Query structure: ```typescript { query: { bool: { must: [ { multi_match: { query: searchTerm, fields: ["subject", "body"] } } ], filter: [ { term: { accountId: account } }, { term: { folder: folder } } ] } } } ``` --- ## Phase 3: AI Email Categorization ⭐ CORE ### Categories 1. Interested 2. Meeting Booked 3. Not Interested 4. Spam 5. Out of Office ### Implementation Steps #### 3.1 Gemini Integration Create `src/services/gemini/GeminiClient.ts`: - Use Gemini API endpoint - Configure API key from environment #### 3.2 System Prompt Definition ``` Analyze the email and categorize it into ONE of these categories: - Interested: Lead shows interest - Meeting Booked: Meeting scheduled - Not Interested: Negative response - Spam: Unwanted promotional emails - Out of Office: Auto-reply responses Return JSON with category field only. ``` #### 3.3 JSON Schema Configuration ```typescript { type: "OBJECT", properties: { category: { type: "STRING", enum: ["Interested", "Meeting Booked", "Not Interested", "Spam", "Out of Office"] } }, required: ["category"] } ``` #### 3.4 Categorization Flow 1. Email indexed in Elasticsearch 2. Send subject + body to Gemini API 3. Receive category response 4. Update Elasticsearch document with `aiCategory` 5. Trigger webhook if category is "Interested" #### 3.5 Error Handling - Implement retry logic with exponential backoff - Handle rate limits gracefully - Default to "Uncategorized" on failure - Log all categorization attempts --- ## Phase 4: Slack & Webhook Integration ⭐ CORE ### Implementation Steps #### 4.1 Webhook Service Create `src/services/webhooks/WebhookClient.ts` #### 4.2 Trigger Condition Activate when `aiCategory === "Interested"` #### 4.3 Slack Notification ```typescript async function sendSlackNotification(email: EmailDocument) { const payload = { text: `🎯 Interested Lead Detected!`, blocks: [ { type: "section", fields: [ { type: "mrkdwn", text: `*From:* ${email.from}` }, { type: "mrkdwn", text: `*Subject:* ${email.subject}` }, { type: "mrkdwn", text: `*Account:* ${email.accountId}` } ] } ] }; await fetch(SLACK_WEBHOOK_URL, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(payload) }); } ``` #### 4.4 Generic Webhook ```typescript async function triggerGenericWebhook(email: EmailDocument) { await fetch(WEBHOOK_SITE_URL, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ event: 'InterestedLead', timestamp: new Date().toISOString(), email: { id: email.id, from: email.from, subject: email.subject, accountId: email.accountId, category: email.aiCategory } }) }); } ``` --- ## Phase 5: Frontend Interface 🌟 UNLOCKS FRONTEND ### Prerequisites Complete Phases 1-4 with Postman validation ### Backend API Endpoints #### 5.1 Required Endpoints ```typescript GET /api/accounts // List configured email accounts GET /api/emails // Paginated email list GET /api/emails/search // Search with filters GET /api/emails/:id // Single email details ``` ### Frontend Implementation #### 5.2 Technology Choice - Simple HTML/CSS/JS, or - React with TypeScript #### 5.3 Core Components **Email List** - Display emails in table/card format - Show: subject, from, date, AI category - Pagination support **Filter Panel** - Account dropdown (from `/api/accounts`) - Folder dropdown (INBOX, Sent, etc.) - Category filter (AI categories) **Search Bar** - Full-text search input - Calls `/api/emails/search` endpoint **AI Category Tags** - Visual badges for each category - Color-coded (e.g., green for Interested) #### 5.4 Integration Flow 1. Fetch accounts on page load 2. Display email list with filters 3. Update list on filter/search change 4. Show AI category prominently 5. Click email to view details --- ## Phase 6: RAG Suggested Replies 🏆 INTERVIEW FAST-TRACK ### Prerequisites Complete Phase 5 with working UI ### Implementation Steps #### 6.1 Vector Database Setup Initialize Qdrant collection: ```typescript { collection_name: "product_knowledge", vectors: { size: 768, // Gemini embedding dimension distance: "Cosine" } } ``` #### 6.2 Knowledge Base Preparation Create training data file with: - Product information - Outreach agenda - Meeting booking links - Common responses Split into chunks (200-300 words each) #### 6.3 Embedding Generation ```typescript async function generateEmbedding(text: string) { const response = await fetch( `https://generativelanguage.googleapis.com/v1/models/embedding-001:embedContent`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: "models/embedding-001", content: { parts: [{ text }] } }) } ); return response.json(); } ``` #### 6.4 Index Knowledge Base For each chunk: 1. Generate embedding 2. Store in Qdrant with metadata 3. Include original text as payload #### 6.5 RAG Pipeline **Endpoint**: `POST /api/emails/:id/suggest-reply` **Flow**: 1. Receive email ID 2. Fetch email content from Elasticsearch 3. Generate embedding for email text 4. Query Qdrant for top 3 similar chunks 5. Assemble prompt: ``` System: Act as professional assistant Context: [Retrieved chunks] Email: [Original email] Instruction: Draft reply based ONLY on context ``` 6. Send to Gemini API 7. Return suggested reply #### 6.6 Frontend Integration - Add "Suggest Reply" button on email detail view - Display loading state during generation - Show suggested reply in text area - Allow editing before sending --- ## Testing Strategy ### Phase 1-4: Postman Testing - Test IMAP sync with real accounts - Verify Elasticsearch indexing - Check AI categorization accuracy - Validate webhook triggers ### Phase 5: UI Testing - Test all filters and search - Verify email display - Check responsive design - Validate API integration ### Phase 6: RAG Testing - Test reply relevance - Verify context grounding - Check response quality - Validate edge cases --- ## Deployment Checklist ### Code Quality - [ ] TypeScript strict mode enabled - [ ] Proper error handling throughout - [ ] Environment variables for all configs - [ ] No hardcoded credentials - [ ] Logging implemented (winston/pino) - [ ] Code modular and organized ### Documentation - [ ] README with setup instructions - [ ] Architecture diagram included - [ ] API documentation - [ ] Environment variable examples - [ ] Known limitations documented ### Repository - [ ] Private GitHub repository created - [ ] Access granted to reviewers - [ ] .gitignore configured - [ ] Clean commit history - [ ] No sensitive data committed ### Demo Video - [ ] Screen recording prepared - [ ] Real-time sync demonstrated - [ ] All features showcased - [ ] Clear audio narration - [ ] Under 5 minutes duration --- ## Success Criteria ### Minimum Viable Implementation ✅ Phase 1: IMAP IDLE working (no polling) ✅ Phase 2: Elasticsearch search functional ✅ Phase 3: AI categorization accurate ✅ Phase 4: Webhook notifications working ### Strong Candidate ✅ All above phases complete ✅ Phase 5: Functional UI integrated ### Exceptional Candidate ✅ All above phases complete ✅ Phase 6: RAG replies implemented 🏆 Direct final interview invitation --- ## Key Technical Decisions ### IMAP Implementation - Use `node-imap` library - Maintain persistent connections - Implement connection watchdog - Handle network interruptions gracefully ### Search Strategy - Elasticsearch for full-text search - Keyword fields for exact filtering - Combine `must` and `filter` clauses - Optimize query performance ### AI Integration - Gemini API for categorization - JSON mode for structured output - System prompts for consistency - Error handling with retries ### RAG Architecture - Qdrant for vector storage - Chunk size: 200-300 words - Top K retrieval: 3 chunks - Context-grounded generation --- ## Notes - Focus on quality over quantity - Complete phases sequentially - Test thoroughly before moving forward - Document all architectural decisions - Implement proper error handling from start - Use TypeScript types throughout - No plagiarism - original implementation required

ReachInbox Email Aggregator - Development Plan

Related Documents

ML Feedback Loop Analysis

Go-Attention API Documentation

Agent Learnings - Papr Memory Python SDK

tensor_factorization