Loading...
Loading...
Loading...
**Date:** November 17, 2025
# AI Privacy Layer MVP - Technical Design Document
**Version:** 1.0
**Date:** November 17, 2025
**Status:** Draft
---
## 1. Problem Statement
### The Core Problem
Financial institutions, fintech companies, and enterprises want to use state-of-the-art LLMs (GPT-4, Claude, etc.) to improve productivity and build AI-powered features, but they cannot because:
1. **Data Privacy Risk**: Sending customer data (credit cards, SSNs, account numbers, PII) to third-party LLM providers exposes them to:
- Regulatory violations (GDPR, CCPA, HIPAA, SOX)
- Data breach liability
- Customer trust erosion
- Competitive intelligence leakage
2. **Current Solutions Are Inadequate**:
- **Self-hosting open-source models (LLAMA)**: Requires $200K+ upfront GPU costs, infrastructure teams, and 3-6 month setup time
- **Generic anonymization tools**: One-size-fits-all PII detection that doesn't understand domain-specific sensitive data (API keys, internal account formats, proprietary identifiers)
- **Manual redaction**: Slow, error-prone, doesn't scale
3. **Developer Friction**: Existing privacy solutions require complex integrations, separate workflows, or significant code changes
### Success Criteria
The MVP must enable developers to:
- Use any frontier LLM (GPT-4, Claude, etc.) without exposing sensitive data
- Integrate with **one line of code** (changing the API base URL)
- Define custom sensitive data patterns (not just generic PII)
- Maintain context quality in LLM responses
- Get complete audit trails of what data was anonymized
---
## 2. The Solution We're Building
### Product Overview
A **transparent proxy service** that sits between the user's application and any LLM API provider, automatically detecting and anonymizing sensitive data before it reaches the LLM, then de-anonymizing the response before returning it to the user.
### Core Value Propositions
1. **Zero Code Change Integration**: Users simply point their API calls to our proxy URL instead of the LLM provider's URL
2. **Custom Sensitivity Rules**: Users define what data is sensitive for their use case (credit cards, API keys, internal IDs, etc.)
3. **Context Preservation**: Anonymization maintains semantic meaning so LLM responses remain useful
4. **Universal Compatibility**: Works with any LLM provider (OpenAI, Anthropic, etc.) without vendor lock-in
5. **Complete Transparency**: Full audit logs showing what was detected, anonymized, and when
### MVP Scope (What We're Building)
#### In Scope
- ✅ Proxy service for OpenAI and Anthropic Claude APIs
- ✅ User-defined sensitivity pattern configuration (regex + entity types)
- ✅ Context-preserving tokenization system
- ✅ Request/response anonymization and de-anonymization
- ✅ Support for chat completion endpoints (most common use case)
#### Out of Scope (Post-MVP)
- ❌ Web dashboard UI (configuration via API only)
- ❌ Advanced analytics/reporting
- ❌ Multi-user organizations with role-based access
- ❌ Support for embeddings, fine-tuning, or other LLM endpoints
- ❌ Real-time streaming responses (batch only)
- ❌ Support for 10+ LLM providers (start with 2)
---
## 3. High-Level Architecture
### System Components
#### 3.1 Proxy API Gateway
**Purpose**: Accept incoming requests from users, route to appropriate handlers
**Technology**: Python FastAPI
**Responsibilities**:
- Authenticate API requests (validate API keys)
- Parse incoming LLM API requests
- Route to appropriate LLM provider handler
- Return responses to user
**Endpoints**:
```
POST /v1/proxy/openai/chat/completions
POST /v1/proxy/anthropic/messages
POST /v1/patterns (configure sensitivity patterns)
GET /v1/patterns (retrieve current patterns)
GET /v1/audit-logs (retrieve anonymization logs)
```
---
#### 3.2 Pattern Configuration Service
**Purpose**: Store and manage user-defined sensitivity patterns
**Technology**: PostgreSQL database + Python service layer
**Responsibilities**:
- CRUD operations for sensitivity patterns
- Pattern validation (ensure regex is valid)
- Pattern retrieval for anonymization engine
**Data Model**:
```json
{
"user_id": "user_abc123",
"patterns": [
{
"id": "pattern_1",
"name": "credit_card",
"type": "regex",
"pattern": "\\b\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}\\b",
"enabled": true
},
{
"id": "pattern_2",
"name": "ssn",
"type": "regex",
"pattern": "\\b\\d{3}-\\d{2}-\\d{4}\\b",
"enabled": true
},
{
"id": "pattern_3",
"name": "api_key",
"type": "regex",
"pattern": "\\b(sk-[A-Za-z0-9]{48}|pk_live_[A-Za-z0-9]{24})\\b",
"enabled": true
},
{
"id": "pattern_4",
"name": "email",
"type": "entity",
"entity_type": "EMAIL",
"enabled": true
}
]
}
```
---
#### 3.3 Anonymization Engine
**Purpose**: Detect sensitive data and replace with context-preserving tokens
**Technology**: Python + spaCy (NLP) + regex
**Responsibilities**:
- Scan request content for sensitive data
- Generate unique tokens for each detected entity
- Store token mapping for later de-anonymization
- Preserve context in anonymized text
**Detection Methods**:
1. **Regex-based detection**: For structured data (credit cards, SSNs, phone numbers, API keys)
2. **NLP entity recognition**: For unstructured data (names, emails, organizations) using spaCy
3. **Custom patterns**: User-defined regex patterns for domain-specific sensitive data
**Tokenization Strategy**:
- Generate deterministic tokens within a request session: `{{ENTITY_TYPE_INDEX}}`
- Examples: `{{CREDIT_CARD_1}}`, `{{SSN_1}}`, `{{EMAIL_1}}`, `{{API_KEY_1}}`
- Maintain semantic meaning through token naming
**Example**:
```
Input: "Customer [email protected] with card 4532-1234-5678-9010 made a $5,000 transaction"
Detected Entities:
- EMAIL: [email protected]
- CREDIT_CARD: 4532-1234-5678-9010
Anonymized Output: "Customer {{EMAIL_1}} with card {{CREDIT_CARD_1}} made a $5,000 transaction"
```
---
#### 3.4 Token Consistency (Conversation-Scoped)
**Purpose**: Ensure same sensitive value gets same token within a conversation
**Technology**: Stateless, in-memory processing (no external dependencies)
**Responsibilities**:
- Scan entire conversation history for sensitive data
- Assign consistent tokens (same value → same token)
- Maintain token→value mappings for de-anonymization
- Process independently per request (stateless)
**How It Works**:
When a request arrives with conversation history:
1. Scan all user messages in chronological order
2. Build value→token mapping (e.g., "4532-1234-5678-9010" → "{{CREDIT_CARD_1}}")
3. Reuse existing tokens for repeated values
4. Anonymize all messages using consistent mappings
5. De-anonymize response using same mappings
6. Discard mappings after response sent
**Example**:
```python
# Request with conversation history
messages = [
{"role": "user", "content": "My card is 4532-1234-5678-9010"},
{"role": "assistant", "content": "Got it"},
{"role": "user", "content": "Verify card 4532-1234-5678-9010"}
]
# System builds: {"{{CREDIT_CARD_1}}": "4532-1234-5678-9010"}
# Both instances use same token: {{CREDIT_CARD_1}}
```
**Why Stateless?**
- No infrastructure dependencies (Redis, databases)
- Horizontally scalable (any server can handle any request)
- Simpler architecture and deployment
- User controls conversation history
- Sufficient for MVP use cases
---
#### 3.5 LLM Provider Adapter
**Purpose**: Forward anonymized requests to target LLM providers
**Technology**: Python requests library + provider-specific clients
**Responsibilities**:
- Format anonymized content for target LLM API
- Handle provider-specific authentication
- Make HTTP requests to LLM provider
- Return LLM response to de-anonymization engine
**Supported Providers (MVP)**:
1. **OpenAI** (GPT-4, GPT-3.5)
- Endpoint: `https://api.openai.com/v1/chat/completions`
- Auth: Bearer token in header
2. **Anthropic Claude** (Claude Sonnet, Opus)
- Endpoint: `https://api.anthropic.com/v1/messages`
- Auth: x-api-key in header
**Request Format Handling**:
- Parse user's original request
- Extract and anonymize message content
- Preserve all other parameters (model, temperature, max_tokens, etc.)
- Forward to target provider with user's API key
---
#### 3.6 De-anonymization Engine
**Purpose**: Replace tokens in LLM responses with original sensitive data
**Technology**: Python string processing
**Responsibilities**:
- Parse LLM response content
- Find all tokens ({{ENTITY_TYPE_INDEX}})
- Replace tokens with original values from Token Mapping Store
- Return de-anonymized response to user
**Example**:
```
LLM Response: "The customer {{EMAIL_1}} appears to be making a fraudulent transaction with card {{CREDIT_CARD_1}}. Recommend blocking the card."
De-anonymized: "The customer [email protected] appears to be making a fraudulent transaction with card 4532-1234-5678-9010. Recommend blocking the card."
```
---
#### 3.7 Audit Logging Service
**Purpose**: Record all anonymization operations for compliance
**Technology**: PostgreSQL
**Responsibilities**:
- Log every request processed
- Record detected entities and tokens used
- Provide audit trail for compliance teams
**Log Schema**:
```json
{
"log_id": "log_001",
"user_id": "user_abc123",
"session_id": "req_xyz789",
"timestamp": "2025-11-17T10:30:00Z",
"llm_provider": "openai",
"model": "gpt-4",
"detected_entities": [
{
"type": "EMAIL",
"token": "{{EMAIL_1}}",
"position": [15, 36]
},
{
"type": "CREDIT_CARD",
"token": "{{CREDIT_CARD_1}}",
"position": [47, 66]
}
],
"request_size": 156,
"response_size": 423,
"latency_ms": 1245
}
```
**Note**: Original sensitive data is NEVER stored in audit logs, only token references.
---
### 3.8 Authentication Service
**Purpose**: Validate user API keys and manage access
**Technology**: PostgreSQL + JWT (optional for future)
**Responsibilities**:
- Generate API keys for users
- Validate incoming API key headers
- Rate limiting per API key
- Track usage for billing
**API Key Format**: `priv_live_[32-char-random-string]`
**User Schema**:
```json
{
"user_id": "user_abc123",
"api_key": "priv_live_k8n2m4p6q8r0s2t4u6v8w0x2y4z6",
"api_key_hash": "sha256_hash_here",
"created_at": "2025-11-17T10:00:00Z",
"rate_limit": 1000,
"requests_used": 0,
"tier": "free"
}
```
---
## 4. Request Flow Diagram
### Complete Request Flow
```
┌─────────────────────────────────────────────────────────────────────┐
│ USER'S APPLICATION │
│ │
│ Original Request: │
│ POST https://api.yourservice.com/v1/proxy/openai/chat/completions │
│ Headers: │
│ - X-API-Key: priv_live_xxx (Your Service Key) │
│ - X-Target-API-Key: sk-xxx (User's OpenAI Key) │
│ Body: │
│ { │
│ "model": "gpt-4", │
│ "messages": [ │
│ { │
│ "role": "user", │
│ "content": "Analyze transaction for │
│ [email protected] with card │
│ 4532-1234-5678-9010" │
│ } │
│ ] │
│ } │
└────────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ STEP 1: API GATEWAY │
│ (FastAPI Service) │
│ │
│ Actions: │
│ 1. Receive request │
│ 2. Validate X-API-Key header │
│ 3. Check rate limits │
│ 4. Generate session_id: req_xyz789 │
│ 5. Route to anonymization engine │
└────────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ STEP 2: PATTERN CONFIGURATION SERVICE │
│ (PostgreSQL) │
│ │
│ Actions: │
│ 1. Fetch user's sensitivity patterns from database │
│ 2. Return patterns to anonymization engine │
│ │
│ Retrieved Patterns: │
│ - EMAIL: regex for emails │
│ - CREDIT_CARD: regex for cards │
│ - SSN: regex for social security numbers │
└────────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ STEP 3: ANONYMIZATION ENGINE │
│ (Python + spaCy + Regex) │
│ │
│ Actions: │
│ 1. Scan message content for sensitive data │
│ 2. Apply regex patterns: find "[email protected]" │
│ 3. Apply NLP entity recognition (if applicable) │
│ 4. Generate tokens: │
│ - "[email protected]" → {{EMAIL_1}} │
│ - "4532-1234-5678-9010" → {{CREDIT_CARD_1}} │
│ 5. Replace sensitive data with tokens │
│ │
│ Anonymized Content: │
│ "Analyze transaction for {{EMAIL_1}} with card {{CREDIT_CARD_1}}" │
└────────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ STEP 4: TOKEN MAPPING (In-Memory) │
│ (Conversation-Scoped) │
│ │
│ Actions: │
│ 1. Store token mappings with session_id as key │
│ 2. Set TTL to 3600 seconds (1 hour) │
│ │
│ Stored Data: │
│ Key: "session:req_xyz789" │
│ Value: { │
│ "{{EMAIL_1}}": "[email protected]", │
│ "{{CREDIT_CARD_1}}": "4532-1234-5678-9010" │
│ } │
│ TTL: 3600 seconds │
└────────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ STEP 5: AUDIT LOGGING SERVICE │
│ (PostgreSQL) │
│ │
│ Actions: │
│ 1. Log the anonymization operation │
│ 2. Record detected entities (types + tokens only, NOT values) │
│ 3. Store timestamp, user_id, session_id │
│ │
│ Log Entry: │
│ { │
│ "session_id": "req_xyz789", │
│ "user_id": "user_abc123", │
│ "detected_entities": [ │
│ {"type": "EMAIL", "token": "{{EMAIL_1}}"}, │
│ {"type": "CREDIT_CARD", "token": "{{CREDIT_CARD_1}}"} │
│ ], │
│ "timestamp": "2025-11-17T10:30:00Z" │
│ } │
└────────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ STEP 6: LLM PROVIDER ADAPTER │
│ (HTTP Client) │
│ │
│ Actions: │
│ 1. Format request for OpenAI API │
│ 2. Add user's OpenAI API key (from X-Target-API-Key header) │
│ 3. Make HTTP POST to https://api.openai.com/v1/chat/completions │
│ │
│ Request Sent to OpenAI: │
│ POST https://api.openai.com/v1/chat/completions │
│ Headers: │
│ Authorization: Bearer sk-xxx (user's key) │
│ Body: │
│ { │
│ "model": "gpt-4", │
│ "messages": [ │
│ { │
│ "role": "user", │
│ "content": "Analyze transaction for {{EMAIL_1}} │
│ with card {{CREDIT_CARD_1}}" │
│ } │
│ ] │
│ } │
└────────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ OPENAI API │
│ │
│ OpenAI processes anonymized request and returns: │
│ │
│ Response: │
│ { │
│ "choices": [ │
│ { │
│ "message": { │
│ "role": "assistant", │
│ "content": "The transaction for customer {{EMAIL_1}} │
│ using card {{CREDIT_CARD_1}} appears to │
│ have high fraud risk due to..." │
│ } │
│ } │
│ ] │
│ } │
│ │
│ NOTE: OpenAI never sees the real email or credit card number! │
└────────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ STEP 7: DE-ANONYMIZATION ENGINE │
│ (Python String Processing) │
│ │
│ Actions: │
│ 1. Receive LLM response │
│ 2. Use token mappings from request processing (in-memory) │
│ 3. Find all tokens in response: {{EMAIL_1}}, {{CREDIT_CARD_1}} │
│ 4. Replace tokens with original values │
│ │
│ Token Replacement: │
│ - {{EMAIL_1}} → [email protected] │
│ - {{CREDIT_CARD_1}} → 4532-1234-5678-9010 │
│ │
│ De-anonymized Response: │
│ "The transaction for customer [email protected] using card │
│ 4532-1234-5678-9010 appears to have high fraud risk due to..." │
└────────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────────┐
│ STEP 8: RETURN TO USER │
│ │
│ Final Response (returned to user's application): │
│ { │
│ "choices": [ │
│ { │
│ "message": { │
│ "role": "assistant", │
│ "content": "The transaction for customer │
│ [email protected] using card │
│ 4532-1234-5678-9010 appears to have │
│ high fraud risk due to..." │
│ } │
│ } │
│ ] │
│ } │
│ │
│ User receives response with original sensitive data restored! │
└─────────────────────────────────────────────────────────────────────┘
```
---
## 5. Key Technical Specifications
### 5.1 API Specifications
#### Proxy Endpoint Format
```
POST /v1/proxy/{provider}/{endpoint}
Examples:
- POST /v1/proxy/openai/chat/completions
- POST /v1/proxy/anthropic/messages
```
#### Required Headers
```
X-API-Key: priv_live_xxx # Your service API key
X-Target-API-Key: sk-xxx # User's LLM provider API key
Content-Type: application/json
```
#### Request Body
Forward the exact same request body format as the target LLM provider expects.
#### Response Format
Return the exact same response format as the target LLM provider, with de-anonymized content.
---
### 5.2 Anonymization Rules
#### Default Entity Types (Always Detected)
1. **EMAIL**: Standard email format (regex)
2. **CREDIT_CARD**: 13-19 digit card numbers with optional separators (regex)
3. **SSN**: XXX-XX-XXXX format (regex)
4. **PHONE**: Various phone number formats (regex)
5. **PERSON**: Named entities detected by spaCy NLP
#### Custom Pattern Configuration
Users can add domain-specific patterns via API:
```
POST /v1/patterns
{
"patterns": [
{
"name": "internal_account_id",
"type": "regex",
"pattern": "ACC-[0-9]{8}",
"enabled": true
},
{
"name": "stripe_api_key",
"type": "regex",
"pattern": "sk_live_[A-Za-z0-9]{24}",
"enabled": true
}
]
}
```
---
### 5.3 Performance Requirements
| Metric | Target | Notes |
|--------|--------|-------|
| **Latency Overhead** | < 200ms | Time added by our service |
| **Throughput** | 100 req/sec | Per instance (horizontal scaling) |
| **Availability** | 99.9% | 43 minutes downtime/month max |
| **Token Storage TTL** | 1 hour | Auto-expire sensitive mappings |
| **Max Request Size** | 128KB | OpenAI's limit |
| **Detection Accuracy** | > 95% | For configured patterns |
---
### 5.4 Security Requirements
1. **Data Encryption**:
- All data in transit: TLS 1.3
- No persistent storage of sensitive data (stateless processing)
- PostgreSQL data encrypted at rest (for audit logs only)
2. **API Key Security**:
- User API keys stored as SHA-256 hashes
- Target LLM API keys NEVER stored (passed through only)
- Rate limiting: 1000 requests/hour per free tier key
3. **Token Mapping Security**:
- Processed in-memory per request only
- Never persisted to disk or external storage
- Discarded immediately after response sent
- Never logged in audit trails
4. **Audit Logging**:
- Log entity TYPES and TOKENS only
- NEVER log original sensitive values
- Immutable audit trail (append-only)
---
### 5.5 Error Handling
#### Anonymization Failures
- If pattern detection fails, DO NOT forward request
- Return 422 Unprocessable Entity with details
#### LLM Provider Errors
- Forward original error from provider to user
- Log error in audit trail
#### De-anonymization Failures
- If token mapping not found (expired), return 410 Gone
- Include message: "Session expired. Please retry request."
---
## 6. Data Models
### User Table (PostgreSQL)
```sql
CREATE TABLE users (
user_id UUID PRIMARY KEY,
api_key_hash VARCHAR(64) NOT NULL UNIQUE,
created_at TIMESTAMP NOT NULL,
tier VARCHAR(20) DEFAULT 'free',
rate_limit INTEGER DEFAULT 1000,
requests_used INTEGER DEFAULT 0
);
```
### Patterns Table (PostgreSQL)
```sql
CREATE TABLE patterns (
pattern_id UUID PRIMARY KEY,
user_id UUID REFERENCES users(user_id),
name VARCHAR(100) NOT NULL,
type VARCHAR(20) NOT NULL,
pattern TEXT NOT NULL,
entity_type VARCHAR(50),
enabled BOOLEAN DEFAULT true,
created_at TIMESTAMP NOT NULL
);
```
### Audit Logs Table (PostgreSQL)
```sql
CREATE TABLE audit_logs (
log_id UUID PRIMARY KEY,
user_id UUID REFERENCES users(user_id),
session_id VARCHAR(50) NOT NULL,
timestamp TIMESTAMP NOT NULL,
llm_provider VARCHAR(50) NOT NULL,
model VARCHAR(100) NOT NULL,
detected_entities JSONB NOT NULL,
request_size INTEGER,
response_size INTEGER,
latency_ms INTEGER
);
```
### Token Mappings (In-Memory, Per-Request)
```
Scope: Single request/response cycle
Storage: In-memory dictionary during request processing
Lifecycle: Created → Used → Discarded
Example:
{
"{{EMAIL_1}}": "[email protected]",
"{{CREDIT_CARD_1}}": "4532-1234-5678-9010"
}
Note: Mappings are built by scanning conversation history,
ensuring consistency within the conversation context.
```
---
## 7. MVP Implementation Checklist
### Phase 1: Core Infrastructure (Week 1-2)
- [ ] Set up FastAPI project structure
- [ ] Implement API Gateway with routing
- [ ] Set up PostgreSQL database with schema
- [ ] Set up Redis for token mappings
- [ ] Implement authentication middleware (API key validation)
- [ ] Implement rate limiting
### Phase 2: Anonymization (Week 2-3)
- [ ] Build regex-based pattern detection
- [ ] Integrate spaCy for NLP entity recognition
- [ ] Implement token generation logic
- [ ] Build anonymization engine (text → anonymized text)
- [ ] Implement token mapping storage in Redis
- [ ] Add pattern configuration API endpoints
### Phase 3: LLM Integration (Week 3-4)
- [ ] Build OpenAI adapter
- [ ] Build Anthropic adapter
- [ ] Implement request forwarding
- [ ] Implement response handling
- [ ] Add error handling for LLM provider failures
### Phase 4: De-anonymization (Week 4-5)
- [ ] Build token replacement logic
- [ ] Implement token mapping retrieval from Redis
- [ ] Handle edge cases (expired sessions, missing tokens)
- [ ] Test with complex multi-entity responses
### Phase 5: Audit & Security (Week 5-6)
- [ ] Implement audit logging service
- [ ] Add TLS/SSL for all endpoints
- [ ] Implement secure API key hashing
- [ ] Add request/response size limits
- [ ] Test security measures (no sensitive data in logs)
### Phase 6: Testing & Deployment (Week 6-8)
- [ ] Unit tests for all components
- [ ] Integration tests for full flow
- [ ] Load testing (100+ req/sec)
- [ ] Deploy to production (Railway/Render/Fly.io)
- [ ] Set up monitoring (Sentry, logs)
- [ ] Create simple onboarding API key generation
---
## 8. Success Metrics
### Technical Metrics
- Latency overhead: < 200ms (95th percentile)
- Uptime: > 99.9%
- Detection accuracy: > 95% for configured patterns
- Zero instances of sensitive data leakage in logs or errors
### Product Metrics (First 3 Months)
- 50+ developer signups
- 20+ companies actively using (>100 requests/week)
- 3-5 companies requesting paid plans
- 1-2 paying customers
---
## 9. Open Questions & Future Considerations
### Questions to Resolve During Build
1. Should we support streaming responses (SSE) in MVP? → NO (post-MVP)
2. How to handle multi-turn conversations with consistent tokenization? → Scan full conversation history
3. Should we cache anonymization patterns for performance? → Load once at startup (singleton pattern)
4. What happens if conversation history is very large? → Reasonable limits, optimize scanning if needed
### Post-MVP Features
1. Web dashboard for pattern configuration
2. Advanced analytics (most common entity types, usage patterns)
3. Support for more LLM providers (Cohere, AI21, etc.)
4. Real-time streaming response support
5. Organization-level accounts with team management
6. Embeddings and fine-tuning endpoint support
7. On-premises deployment option for enterprises
---
## 10. Architecture Diagram (High-Level)
```
┌──────────────────┐
│ User's App │
│ (Python/JS/etc) │
└────────┬─────────┘
│ POST /v1/proxy/openai/chat/completions
│ Headers: X-API-Key, X-Target-API-Key
│
▼
┌─────────────────────────────────────────────────────┐
│ Your Privacy Layer Service │
│ ┌───────────────────────────────────────────────┐ │
│ │ API Gateway (FastAPI) │ │
│ │ - Auth validation │ │
│ │ - Rate limiting │ │
│ │ - Request routing │ │
│ └──────────────────┬────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Pattern Configuration Service │ │
│ │ - Fetch user's sensitivity patterns │ │
│ │ - PostgreSQL storage │ │
│ └──────────────────┬────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────┐ │
│ │ Anonymization Engine │ │
│ │ - Regex pattern matching │ │
│ │ - spaCy NLP entity detection │ │
│ │ - Token generation │ │
│ │ - Text replacement │ │
│ └──────────────────┬────────────────────────────┘ │
│ │ │
│ ┌───────────┴───────────┐ │
│ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────────┐ │
│ │Token Store │ │ Audit Logger │ │
│ │(Redis) │ │ (PostgreSQL) │ │
│ │- TTL: 1hr │ │- Entity types │ │
│ │- Mappings │ │- Timestamps │ │
│ └─────────────┘ └─────────────────┘ │
│ │ │
│ ▼ │
│ ┌───────────────────────────────────────────────┐ │
│ │ LLM Provider Adapter │ │
│ │ - OpenAI client │ │
│ │ - Anthropic client │ │
│ │ - Request forwarding │ │
│ └──────────────────┬────────────────────────────┘ │
└────────────────────┼────────────────────────────────┘
│ Anonymized Request
│
▼
┌──────────────────────┐
│ LLM Provider │
│ (OpenAI/Claude) │
│ │
│ Receives: │
│ "Customer │
│ {{EMAIL_1}} with │
│ card │
│ {{CREDIT_CARD_1}}" │
└──────────┬───────────┘
│ LLM Response
│
▼
┌────────────────────────────────────────────────────┐
│ Your Privacy Layer Service │
│ ┌───────────────────────────────────────────────┐ │
│ │ De-anonymization Engine │ │
│ │ - Fetch token mappings from Redis │ │
│ │ - Replace tokens with original values │ │
│ │ - Return complete response │ │
│ └──────────────────┬────────────────────────────┘ │
└────────────────────┼────────────────────────────────┘
│ De-anonymized Response
│
▼
┌──────────────────────┐
│ User's App │
│ │
│ Receives: │
│ "Customer │
│ [email protected] │
│ with card │
│ 4532-1234-5678- │
│ 9010..." │
└──────────────────────┘
```
---
## 11. Code Structure
### Recommended Project Structure
```
privacy-llm-proxy/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI app entry point
│ ├── config.py # Configuration management
│ ├── dependencies.py # Dependency injection
│ │
│ ├── api/ # API endpoints
│ │ ├── __init__.py
│ │ ├── proxy.py # Proxy endpoints
│ │ ├── patterns.py # Pattern configuration endpoints
│ │ └── audit.py # Audit log endpoints
│ │
│ ├── core/ # Core business logic
│ │ ├── __init__.py
│ │ ├── anonymization.py # Anonymization engine
│ │ ├── deanonymization.py # De-anonymization engine
│ │ ├── token_manager.py # Token generation and mapping
│ │ └── pattern_matcher.py # Pattern detection logic
│ │
│ ├── adapters/ # LLM provider adapters
│ │ ├── __init__.py
│ │ ├── base.py # Abstract base adapter
│ │ ├── openai_adapter.py # OpenAI integration
│ │ └── anthropic_adapter.py # Anthropic integration
│ │
│ ├── models/ # Data models
│ │ ├── __init__.py
│ │ ├── user.py # User model
│ │ ├── pattern.py # Pattern model
│ │ ├── audit_log.py # Audit log model
│ │ └── request.py # Request/response models
│ │
│ ├── db/ # Database
│ │ ├── __init__.py
│ │ ├── postgres.py # PostgreSQL connection
│ │ └── redis.py # Redis connection
│ │
│ ├── services/ # Service layer
│ │ ├── __init__.py
│ │ ├── auth_service.py # Authentication
│ │ ├── pattern_service.py # Pattern CRUD
│ │ └── audit_service.py # Audit logging
│ │
│ └── utils/ # Utilities
│ ├── __init__.py
│ ├── validators.py # Input validation
│ └── security.py # Security utilities
│
├── tests/ # Test suite
│ ├── test_anonymization.py
│ ├── test_proxy.py
│ └── test_patterns.py
│
├── alembic/ # Database migrations
│ └── versions/
│
├── requirements.txt # Python dependencies
├── Dockerfile # Container definition
├── docker-compose.yml # Local development setup
└── README.md # Setup instructions
```
---
## 12. Key Dependencies
### Python Packages
```
fastapi==0.104.1 # Web framework
uvicorn==0.24.0 # ASGI server
pydantic==2.5.0 # Data validation
sqlalchemy==2.0.23 # ORM for PostgreSQL
psycopg2-binary==2.9.9 # PostgreSQL driver
spacy==3.7.2 # NLP for entity recognition
httpx==0.25.2 # HTTP client for LLM APIs
python-jose[cryptography] # JWT handling
passlib[bcrypt] # Password hashing
alembic==1.13.0 # Database migrations
python-dotenv==1.0.0 # Environment variables
sentry-sdk==1.38.0 # Error tracking
prometheus-client==0.19.0 # Metrics
```
### System Requirements
- Python 3.11+
- PostgreSQL 15+ (optional, for audit logs)
- 2GB RAM minimum
- 10GB disk space
---
## 13. Environment Variables
### Required Configuration
```bash
# Service Configuration
SERVICE_NAME=privacy-llm-proxy
ENVIRONMENT=production
PORT=8000
LOG_LEVEL=INFO
# Database
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=privacy_proxy
POSTGRES_USER=proxy_user
POSTGRES_PASSWORD=secure_password
# No Redis needed for MVP - stateless processing
# Security
API_KEY_SALT=random_salt_string
JWT_SECRET=jwt_secret_key
ENCRYPTION_KEY=encryption_key_for_data
# Rate Limiting
FREE_TIER_RATE_LIMIT=1000
PAID_TIER_RATE_LIMIT=10000
# Monitoring
SENTRY_DSN=https://sentry.io/your-project
PROMETHEUS_PORT=9090
# Feature Flags
ENABLE_NLP_DETECTION=true
ENABLE_AUDIT_LOGS=true
```
---
## 14. API Usage Examples
### Example 1: Configure Patterns
```bash
curl -X POST https://api.yourservice.com/v1/patterns \
-H "X-API-Key: priv_live_k8n2m4p6q8r0s2t4u6v8w0x2y4z6" \
-H "Content-Type: application/json" \
-d '{
"patterns": [
{
"name": "credit_card",
"type": "regex",
"pattern": "\\b\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}\\b",
"enabled": true
},
{
"name": "ssn",
"type": "regex",
"pattern": "\\b\\d{3}-\\d{2}-\\d{4}\\b",
"enabled": true
},
{
"name": "email",
"type": "entity",
"entity_type": "EMAIL",
"enabled": true
}
]
}'
```
### Example 2: Make Proxied Request to OpenAI
```bash
curl -X POST https://api.yourservice.com/v1/proxy/openai/chat/completions \
-H "X-API-Key: priv_live_k8n2m4p6q8r0s2t4u6v8w0x2y4z6" \
-H "X-Target-API-Key: sk-proj-xxx" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Analyze this transaction: Customer [email protected] used card 4532-1234-5678-9010 for $5,000 purchase. Is this fraudulent?"
}
],
"temperature": 0.7
}'
```
**What happens internally:**
1. Your service detects `[email protected]` and `4532-1234-5678-9010`
2. Replaces with `{{EMAIL_1}}` and `{{CREDIT_CARD_1}}`
3. Sends anonymized request to OpenAI
4. OpenAI responds with tokens in the text
5. Your service replaces tokens back to original values
6. Returns de-anonymized response to user
### Example 3: Retrieve Audit Logs
```bash
curl -X GET "https://api.yourservice.com/v1/audit-logs?limit=10" \
-H "X-API-Key: priv_live_k8n2m4p6q8r0s2t4u6v8w0x2y4z6"
```
**Response:**
```json
{
"logs": [
{
"log_id": "log_001",
"timestamp": "2025-11-17T10:30:00Z",
"llm_provider": "openai",
"model": "gpt-4",
"detected_entities": [
{"type": "EMAIL", "token": "{{EMAIL_1}}", "count": 1},
{"type": "CREDIT_CARD", "token": "{{CREDIT_CARD_1}}", "count": 1}
],
"latency_ms": 1245
}
],
"total": 147,
"page": 1
}
```
---
## 15. Testing Strategy
### Unit Tests
- Test each component in isolation
- Mock external dependencies (PostgreSQL, LLM APIs)
- Focus on core logic: anonymization, de-anonymization, token generation
### Integration Tests
- Test full request flow end-to-end
- Use test databases (PostgreSQL for audit logs if needed)
- Verify data flows correctly between components
### Test Cases to Cover
**Anonymization Engine:**
- ✅ Detects single credit card number
- ✅ Detects multiple credit card numbers
- ✅ Detects emails in various formats
- ✅ Detects custom patterns (user-defined)
- ✅ Handles edge cases (no PII in text)
- ✅ Preserves non-sensitive context
**Token Management:**
- ✅ Generates unique tokens per entity
- ✅ Maintains consistent tokens within conversations
- ✅ Scans conversation history for token reuse
- ✅ Processes requests independently (stateless)
**De-anonymization:**
- ✅ Replaces single token
- ✅ Replaces multiple tokens
- ✅ Handles missing tokens (expired session)
- ✅ Preserves response structure
**API Gateway:**
- ✅ Validates API keys
- ✅ Rate limits requests
- ✅ Routes to correct provider
- ✅ Returns appropriate error codes
### Load Testing
- Use tools like Locust or k6
- Simulate 100+ concurrent requests
- Verify latency stays under 200ms
- Check for memory leaks
---
## 16. Deployment Checklist
### Pre-Production
- [ ] All unit tests passing
- [ ] Integration tests passing
- [ ] Load tests passing (100 req/sec)
- [ ] Security audit complete
- [ ] Environment variables configured
- [ ] Database migrations tested
- [ ] Stateless processing verified
- [ ] Monitoring/alerting set up
- [ ] Documentation complete
### Production Deployment
- [ ] Deploy to cloud platform (Railway/Render/Fly.io)
- [ ] Configure SSL/TLS certificates
- [ ] Set up database backups (daily)
- [ ] Verify horizontal scalability (stateless design)
- [ ] Set up log aggregation (CloudWatch/Datadog)
- [ ] Configure error tracking (Sentry)
- [ ] Set up uptime monitoring (UptimeRobot)
- [ ] Create runbook for incidents
- [ ] Perform smoke tests post-deployment
### Post-Launch
- [ ] Monitor error rates
- [ ] Track latency metrics
- [ ] Review audit logs for anomalies
- [ ] Gather user feedback
- [ ] Iterate based on usage patterns
---
## 17. Risks & Mitigations
### Technical Risks
**Risk 1: Anonymization Accuracy < 95%**
- **Impact**: Sensitive data leaks to LLM providers
- **Mitigation**:
- Extensive pattern testing before launch
- Allow users to test patterns in sandbox mode
- Add fallback: if unsure, block request rather than risk leak
**Risk 2: High Latency (>200ms overhead)**
- **Impact**: Poor user experience, adoption suffers
- **Mitigation**:
- Optimize regex matching (compile patterns once)
- Load patterns at startup (singleton)
- Profile code to find bottlenecks
- Consider async processing where possible
**Risk 3: Large Conversation History**
- **Impact**: Slower processing for very long conversations
- **Mitigation**:
- Set reasonable conversation length limits
- Optimize scanning algorithm
- Cache pattern compilation
- Consider truncating very old messages
**Risk 4: Token Collision**
- **Impact**: Wrong data mapped to wrong token
- **Mitigation**:
- Use session-scoped tokens (req_xyz789 prefix)
- Include timestamp in token generation
- Validate token uniqueness before storing
### Business Risks
**Risk 1: Users Don't Trust Us With Data**
- **Mitigation**:
- Get SOC 2 Type II certification quickly
- Open-source the anonymization logic
- Provide on-premises deployment option
- Transparent audit logs
**Risk 2: LLM Providers Update APIs**
- **Mitigation**:
- Version our adapter layer
- Monitor provider changelogs
- Build adapter tests that catch breaking changes
---
## 18. Success Definition
### MVP is Successful If:
1. **Technical**: 95%+ anonymization accuracy, <200ms latency
2. **Adoption**: 50+ developer signups in first month
3. **Usage**: 20+ companies actively using (>100 requests/week)
4. **Revenue**: 3-5 companies requesting paid plans
5. **Feedback**: Positive feedback on ease of integration
### MVP Has Failed If:
1. <10 signups after launch
2. Users sign up but don't integrate (integration too hard)
3. High error rates (>5%)
4. No one willing to pay
---
## Appendix A: Glossary
**Anonymization**: Process of detecting and replacing sensitive data with tokens
**De-anonymization**: Process of replacing tokens back with original sensitive data
**Context-preserving**: Anonymization that maintains semantic meaning for the LLM
**Token**: Placeholder text like `{{EMAIL_1}}` that represents sensitive data
**Session ID**: Unique identifier for each request, used to scope token mappings
**Pattern**: User-defined regex or entity type that defines what data is sensitive
**Audit Log**: Immutable record of anonymization operations (entity types only)
**Token Mapping**: Key-value store connecting tokens to original sensitive values
**Proxy**: Service that forwards requests to another service (in this case, LLM APIs)
**Stateless**: System doesn't maintain state between requests; each request is processed independently
---
## Appendix B: References
### Similar Products (for Research)
- Protecto.ai - AI-powered data masking
- Private AI - PII detection and anonymization
- Cape Privacy - Encrypted ML collaboration
- Gretel.ai - Synthetic data generation
### Technical Resources
- spaCy Documentation: https://spacy.io/
- FastAPI Documentation: https://fastapi.tiangolo.com/
- OpenAI API Reference: https://platform.openai.com/docs/api-reference
- Anthropic API Reference: https://docs.anthropic.com/
### Compliance Resources
- GDPR Guidelines: https://gdpr.eu/
- SOC 2 Compliance: https://www.aicpa.org/soc
- HIPAA Technical Safeguards: https://www.hhs.gov/hipaa/
---
**END OF DOCUMENT**Complete feature support matrix and compliance details for rrule_plpgsql.
A consistent policy & compliance layer ensures platform guardrails are **predictable, observable, progressive, and reversible**. This document outlines how to use **Kyverno** (cluster runtime admission / mutation / validation) and **Checkov** (CI Infrastructure-as-Code scanning) under the same GitOps promotion model (App‑of‑Apps) to prevent last‑minute surprises.
**Document versie**: 1.3
title: "Specification"