Loading...
Loading...
Loading...
Build an AI-driven email onebox aggregator with real-time IMAP synchronization, intelligent categorization, and RAG-powered reply suggestions using TypeScript and Node.js.
# ReachInbox Email Aggregator - Development Plan
## Project Overview
Build an AI-driven email onebox aggregator with real-time IMAP synchronization, intelligent categorization, and RAG-powered reply suggestions using TypeScript and Node.js.
---
## Phase 0: Project Setup
### 1. Initialize Project
```bash
npm init -y
npm install typescript @types/node ts-node --save-dev
npx tsc --init
```
### 2. Install Core Dependencies
```bash
# IMAP Client
npm install node-imap mailparser
# Database & Search
npm install @elastic/elasticsearch @qdrant/js-client-rest
# Server & Utilities
npm install express dotenv cors
npm install @types/express @types/node-imap --save-dev
# AI Integration
# Use native fetch for Gemini API
```
### 3. Docker Services Setup
Create `docker-compose.yml`:
- **Elasticsearch** (port 9200) - Email search and storage
- **Qdrant** (ports 6333, 6334) - Vector database for RAG
### 4. Project Structure
```
src/
├── services/
│ ├── imap/ # Email synchronization
│ ├── elasticsearch/ # Search & indexing
│ ├── gemini/ # AI categorization & embeddings
│ └── vector-db/ # RAG implementation
├── controllers/ # API handlers
├── routes/ # Express routes
├── types/ # TypeScript interfaces
└── utils/ # Helper functions
```
### 5. Environment Configuration
Create `.env.example`:
```
# IMAP Accounts (minimum 2)
IMAP_EMAIL_1=
IMAP_PASSWORD_1=
IMAP_EMAIL_2=
IMAP_PASSWORD_2=
# Elasticsearch
ELASTICSEARCH_HOST=localhost:9200
# AI
GEMINI_API_KEY=
# Integrations
SLACK_WEBHOOK_URL=
WEBHOOK_SITE_URL=
# Vector DB
QDRANT_HOST=localhost
QDRANT_PORT=6333
```
---
## Phase 1: Real-Time Email Synchronization ⭐ MANDATORY
### Core Requirements
- Persistent IMAP connections using IDLE mode
- No polling or cron jobs
- Sync last 30 days of emails
- Support minimum 2 email accounts
### Implementation Steps
#### 1.1 IMAP Service Setup
Create `src/services/imap/ImapClient.ts`:
- Initialize connection per email account
- Authenticate using credentials from environment
#### 1.2 Initial History Fetch
- Fetch all major folders (INBOX, Sent, Drafts)
- Use IMAP SEARCH: `SINCE <30 days ago>`
- Fetch ENVELOPE and BODYSTRUCTURE
- Parse email metadata (from, to, subject, date)
- Extract plain text body
#### 1.3 Real-Time IDLE Listener
- Transition to IDLE mode after initial sync
- Listen for events:
- `mail` - New email arrival
- `expunge` - Email deletion
- Fetch new emails immediately on event
- Pass to indexing pipeline
#### 1.4 Connection Watchdog
- Ping connection every 29 minutes
- Re-send IDLE command to maintain session
- Handle disconnections with auto-reconnect
- Implement exponential backoff on failures
#### 1.5 Data Structure
```typescript
interface EmailDocument {
id: string;
accountId: string;
folder: string;
subject: string;
body: string;
from: string;
to: string[];
date: Date;
aiCategory: string;
indexedAt: Date;
}
```
---
## Phase 2: Elasticsearch Storage & Search ⭐ CORE
### Implementation Steps
#### 2.1 Elasticsearch Client Setup
Create `src/services/elasticsearch/ElasticsearchClient.ts`:
- Connect to `http://localhost:9200`
- Create `emails` index with proper mapping
#### 2.2 Index Mapping
```typescript
{
subject: { type: 'text' },
body: { type: 'text' },
accountId: { type: 'keyword' },
folder: { type: 'keyword' },
date: { type: 'date' },
aiCategory: { type: 'keyword' },
from: { type: 'keyword' },
to: { type: 'keyword' }
}
```
#### 2.3 Indexing Pipeline
- Receive email from IMAP service
- Strip HTML, convert to plain text
- Index immediately to Elasticsearch
- Return indexed document ID
#### 2.4 Search Implementation
Create search endpoint: `GET /api/emails/search`
Query parameters:
- `q` - Search term (subject/body)
- `account` - Filter by accountId
- `folder` - Filter by folder
- `category` - Filter by AI category
Query structure:
```typescript
{
query: {
bool: {
must: [
{ multi_match: { query: searchTerm, fields: ["subject", "body"] } }
],
filter: [
{ term: { accountId: account } },
{ term: { folder: folder } }
]
}
}
}
```
---
## Phase 3: AI Email Categorization ⭐ CORE
### Categories
1. Interested
2. Meeting Booked
3. Not Interested
4. Spam
5. Out of Office
### Implementation Steps
#### 3.1 Gemini Integration
Create `src/services/gemini/GeminiClient.ts`:
- Use Gemini API endpoint
- Configure API key from environment
#### 3.2 System Prompt Definition
```
Analyze the email and categorize it into ONE of these categories:
- Interested: Lead shows interest
- Meeting Booked: Meeting scheduled
- Not Interested: Negative response
- Spam: Unwanted promotional emails
- Out of Office: Auto-reply responses
Return JSON with category field only.
```
#### 3.3 JSON Schema Configuration
```typescript
{
type: "OBJECT",
properties: {
category: {
type: "STRING",
enum: ["Interested", "Meeting Booked", "Not Interested", "Spam", "Out of Office"]
}
},
required: ["category"]
}
```
#### 3.4 Categorization Flow
1. Email indexed in Elasticsearch
2. Send subject + body to Gemini API
3. Receive category response
4. Update Elasticsearch document with `aiCategory`
5. Trigger webhook if category is "Interested"
#### 3.5 Error Handling
- Implement retry logic with exponential backoff
- Handle rate limits gracefully
- Default to "Uncategorized" on failure
- Log all categorization attempts
---
## Phase 4: Slack & Webhook Integration ⭐ CORE
### Implementation Steps
#### 4.1 Webhook Service
Create `src/services/webhooks/WebhookClient.ts`
#### 4.2 Trigger Condition
Activate when `aiCategory === "Interested"`
#### 4.3 Slack Notification
```typescript
async function sendSlackNotification(email: EmailDocument) {
const payload = {
text: `🎯 Interested Lead Detected!`,
blocks: [
{
type: "section",
fields: [
{ type: "mrkdwn", text: `*From:* ${email.from}` },
{ type: "mrkdwn", text: `*Subject:* ${email.subject}` },
{ type: "mrkdwn", text: `*Account:* ${email.accountId}` }
]
}
]
};
await fetch(SLACK_WEBHOOK_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload)
});
}
```
#### 4.4 Generic Webhook
```typescript
async function triggerGenericWebhook(email: EmailDocument) {
await fetch(WEBHOOK_SITE_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
event: 'InterestedLead',
timestamp: new Date().toISOString(),
email: {
id: email.id,
from: email.from,
subject: email.subject,
accountId: email.accountId,
category: email.aiCategory
}
})
});
}
```
---
## Phase 5: Frontend Interface 🌟 UNLOCKS FRONTEND
### Prerequisites
Complete Phases 1-4 with Postman validation
### Backend API Endpoints
#### 5.1 Required Endpoints
```typescript
GET /api/accounts // List configured email accounts
GET /api/emails // Paginated email list
GET /api/emails/search // Search with filters
GET /api/emails/:id // Single email details
```
### Frontend Implementation
#### 5.2 Technology Choice
- Simple HTML/CSS/JS, or
- React with TypeScript
#### 5.3 Core Components
**Email List**
- Display emails in table/card format
- Show: subject, from, date, AI category
- Pagination support
**Filter Panel**
- Account dropdown (from `/api/accounts`)
- Folder dropdown (INBOX, Sent, etc.)
- Category filter (AI categories)
**Search Bar**
- Full-text search input
- Calls `/api/emails/search` endpoint
**AI Category Tags**
- Visual badges for each category
- Color-coded (e.g., green for Interested)
#### 5.4 Integration Flow
1. Fetch accounts on page load
2. Display email list with filters
3. Update list on filter/search change
4. Show AI category prominently
5. Click email to view details
---
## Phase 6: RAG Suggested Replies 🏆 INTERVIEW FAST-TRACK
### Prerequisites
Complete Phase 5 with working UI
### Implementation Steps
#### 6.1 Vector Database Setup
Initialize Qdrant collection:
```typescript
{
collection_name: "product_knowledge",
vectors: {
size: 768, // Gemini embedding dimension
distance: "Cosine"
}
}
```
#### 6.2 Knowledge Base Preparation
Create training data file with:
- Product information
- Outreach agenda
- Meeting booking links
- Common responses
Split into chunks (200-300 words each)
#### 6.3 Embedding Generation
```typescript
async function generateEmbedding(text: string) {
const response = await fetch(
`https://generativelanguage.googleapis.com/v1/models/embedding-001:embedContent`,
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: "models/embedding-001",
content: { parts: [{ text }] }
})
}
);
return response.json();
}
```
#### 6.4 Index Knowledge Base
For each chunk:
1. Generate embedding
2. Store in Qdrant with metadata
3. Include original text as payload
#### 6.5 RAG Pipeline
**Endpoint**: `POST /api/emails/:id/suggest-reply`
**Flow**:
1. Receive email ID
2. Fetch email content from Elasticsearch
3. Generate embedding for email text
4. Query Qdrant for top 3 similar chunks
5. Assemble prompt:
```
System: Act as professional assistant
Context: [Retrieved chunks]
Email: [Original email]
Instruction: Draft reply based ONLY on context
```
6. Send to Gemini API
7. Return suggested reply
#### 6.6 Frontend Integration
- Add "Suggest Reply" button on email detail view
- Display loading state during generation
- Show suggested reply in text area
- Allow editing before sending
---
## Testing Strategy
### Phase 1-4: Postman Testing
- Test IMAP sync with real accounts
- Verify Elasticsearch indexing
- Check AI categorization accuracy
- Validate webhook triggers
### Phase 5: UI Testing
- Test all filters and search
- Verify email display
- Check responsive design
- Validate API integration
### Phase 6: RAG Testing
- Test reply relevance
- Verify context grounding
- Check response quality
- Validate edge cases
---
## Deployment Checklist
### Code Quality
- [ ] TypeScript strict mode enabled
- [ ] Proper error handling throughout
- [ ] Environment variables for all configs
- [ ] No hardcoded credentials
- [ ] Logging implemented (winston/pino)
- [ ] Code modular and organized
### Documentation
- [ ] README with setup instructions
- [ ] Architecture diagram included
- [ ] API documentation
- [ ] Environment variable examples
- [ ] Known limitations documented
### Repository
- [ ] Private GitHub repository created
- [ ] Access granted to reviewers
- [ ] .gitignore configured
- [ ] Clean commit history
- [ ] No sensitive data committed
### Demo Video
- [ ] Screen recording prepared
- [ ] Real-time sync demonstrated
- [ ] All features showcased
- [ ] Clear audio narration
- [ ] Under 5 minutes duration
---
## Success Criteria
### Minimum Viable Implementation
✅ Phase 1: IMAP IDLE working (no polling)
✅ Phase 2: Elasticsearch search functional
✅ Phase 3: AI categorization accurate
✅ Phase 4: Webhook notifications working
### Strong Candidate
✅ All above phases complete
✅ Phase 5: Functional UI integrated
### Exceptional Candidate
✅ All above phases complete
✅ Phase 6: RAG replies implemented
🏆 Direct final interview invitation
---
## Key Technical Decisions
### IMAP Implementation
- Use `node-imap` library
- Maintain persistent connections
- Implement connection watchdog
- Handle network interruptions gracefully
### Search Strategy
- Elasticsearch for full-text search
- Keyword fields for exact filtering
- Combine `must` and `filter` clauses
- Optimize query performance
### AI Integration
- Gemini API for categorization
- JSON mode for structured output
- System prompts for consistency
- Error handling with retries
### RAG Architecture
- Qdrant for vector storage
- Chunk size: 200-300 words
- Top K retrieval: 3 chunks
- Context-grounded generation
---
## Notes
- Focus on quality over quantity
- Complete phases sequentially
- Test thoroughly before moving forward
- Document all architectural decisions
- Implement proper error handling from start
- Use TypeScript types throughout
- No plagiarism - original implementation required
> Design document analyzing how user actions feed back into ML predictions,
This document provides a complete reference for all exported APIs in the go-attention library.
This document captures important learnings and best practices discovered while building and maintaining the Papr Memory Python SDK, specifically around on-device processing and Core ML integration.
Tensor factorization is a method for decomposing tensors, which are described in [Section @sec:loading-rescal], into lower-rank approximations.