AI Privacy Layer MVP - Technical Design Document

Version: 1.0
Date: November 17, 2025
Status: Draft

1. Problem Statement

The Core Problem

Financial institutions, fintech companies, and enterprises want to use state-of-the-art LLMs (GPT-4, Claude, etc.) to improve productivity and build AI-powered features, but they cannot because:

Data Privacy Risk: Sending customer data (credit cards, SSNs, account numbers, PII) to third-party LLM providers exposes them to:
- Regulatory violations (GDPR, CCPA, HIPAA, SOX)
- Data breach liability
- Customer trust erosion
- Competitive intelligence leakage
Current Solutions Are Inadequate:
- Self-hosting open-source models (LLAMA): Requires $200K+ upfront GPU costs, infrastructure teams, and 3-6 month setup time
- Generic anonymization tools: One-size-fits-all PII detection that doesn't understand domain-specific sensitive data (API keys, internal account formats, proprietary identifiers)
- Manual redaction: Slow, error-prone, doesn't scale
Developer Friction: Existing privacy solutions require complex integrations, separate workflows, or significant code changes

Success Criteria

The MVP must enable developers to:

Use any frontier LLM (GPT-4, Claude, etc.) without exposing sensitive data
Integrate with one line of code (changing the API base URL)
Define custom sensitive data patterns (not just generic PII)
Maintain context quality in LLM responses
Get complete audit trails of what data was anonymized

2. The Solution We're Building

Product Overview

A transparent proxy service that sits between the user's application and any LLM API provider, automatically detecting and anonymizing sensitive data before it reaches the LLM, then de-anonymizing the response before returning it to the user.

Core Value Propositions

Zero Code Change Integration: Users simply point their API calls to our proxy URL instead of the LLM provider's URL
Custom Sensitivity Rules: Users define what data is sensitive for their use case (credit cards, API keys, internal IDs, etc.)
Context Preservation: Anonymization maintains semantic meaning so LLM responses remain useful
Universal Compatibility: Works with any LLM provider (OpenAI, Anthropic, etc.) without vendor lock-in
Complete Transparency: Full audit logs showing what was detected, anonymized, and when

MVP Scope (What We're Building)

In Scope

✅ Proxy service for OpenAI and Anthropic Claude APIs
✅ User-defined sensitivity pattern configuration (regex + entity types)
✅ Context-preserving tokenization system
✅ Request/response anonymization and de-anonymization
✅ Support for chat completion endpoints (most common use case)

Out of Scope (Post-MVP)

❌ Web dashboard UI (configuration via API only)
❌ Advanced analytics/reporting
❌ Multi-user organizations with role-based access
❌ Support for embeddings, fine-tuning, or other LLM endpoints
❌ Real-time streaming responses (batch only)
❌ Support for 10+ LLM providers (start with 2)

3. High-Level Architecture

System Components

3.1 Proxy API Gateway

Purpose: Accept incoming requests from users, route to appropriate handlers
Technology: Python FastAPI
Responsibilities:

Authenticate API requests (validate API keys)
Parse incoming LLM API requests
Route to appropriate LLM provider handler
Return responses to user

Endpoints:

POST /v1/proxy/openai/chat/completions
POST /v1/proxy/anthropic/messages
POST /v1/patterns (configure sensitivity patterns)
GET  /v1/patterns (retrieve current patterns)
GET  /v1/audit-logs (retrieve anonymization logs)

3.2 Pattern Configuration Service

Purpose: Store and manage user-defined sensitivity patterns
Technology: PostgreSQL database + Python service layer
Responsibilities:

CRUD operations for sensitivity patterns
Pattern validation (ensure regex is valid)
Pattern retrieval for anonymization engine

Data Model:

{
  "user_id": "user_abc123",
  "patterns": [
    {
      "id": "pattern_1",
      "name": "credit_card",
      "type": "regex",
      "pattern": "\\b\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}\\b",
      "enabled": true
    },
    {
      "id": "pattern_2", 
      "name": "ssn",
      "type": "regex",
      "pattern": "\\b\\d{3}-\\d{2}-\\d{4}\\b",
      "enabled": true
    },
    {
      "id": "pattern_3",
      "name": "api_key",
      "type": "regex", 
      "pattern": "\\b(sk-[A-Za-z0-9]{48}|pk_live_[A-Za-z0-9]{24})\\b",
      "enabled": true
    },
    {
      "id": "pattern_4",
      "name": "email",
      "type": "entity",
      "entity_type": "EMAIL",
      "enabled": true
    }
  ]
}

3.3 Anonymization Engine

Purpose: Detect sensitive data and replace with context-preserving tokens
Technology: Python + spaCy (NLP) + regex
Responsibilities:

Scan request content for sensitive data
Generate unique tokens for each detected entity
Store token mapping for later de-anonymization
Preserve context in anonymized text

Detection Methods:

Regex-based detection: For structured data (credit cards, SSNs, phone numbers, API keys)
NLP entity recognition: For unstructured data (names, emails, organizations) using spaCy
Custom patterns: User-defined regex patterns for domain-specific sensitive data

Tokenization Strategy:

Generate deterministic tokens within a request session: {{ENTITY_TYPE_INDEX}}
Examples: {{CREDIT_CARD_1}}, {{SSN_1}}, {{EMAIL_1}}, {{API_KEY_1}}
Maintain semantic meaning through token naming

Example:

Input: "Customer john.doe@chase.com with card 4532-1234-5678-9010 made a $5,000 transaction"

Detected Entities:
- EMAIL: john.doe@chase.com
- CREDIT_CARD: 4532-1234-5678-9010

Anonymized Output: "Customer {{EMAIL_1}} with card {{CREDIT_CARD_1}} made a $5,000 transaction"

3.4 Token Consistency (Conversation-Scoped)

Purpose: Ensure same sensitive value gets same token within a conversation
Technology: Stateless, in-memory processing (no external dependencies)
Responsibilities:

Scan entire conversation history for sensitive data
Assign consistent tokens (same value → same token)
Maintain token→value mappings for de-anonymization
Process independently per request (stateless)

How It Works: When a request arrives with conversation history:

Scan all user messages in chronological order
Build value→token mapping (e.g., "4532-1234-5678-9010" → "{{CREDIT_CARD_1}}")
Reuse existing tokens for repeated values
Anonymize all messages using consistent mappings
De-anonymize response using same mappings
Discard mappings after response sent

Example:

# Request with conversation history
messages = [
  {"role": "user", "content": "My card is 4532-1234-5678-9010"},
  {"role": "assistant", "content": "Got it"},
  {"role": "user", "content": "Verify card 4532-1234-5678-9010"}
]

# System builds: {"{{CREDIT_CARD_1}}": "4532-1234-5678-9010"}
# Both instances use same token: {{CREDIT_CARD_1}}

Why Stateless?

No infrastructure dependencies (Redis, databases)
Horizontally scalable (any server can handle any request)
Simpler architecture and deployment
User controls conversation history
Sufficient for MVP use cases

3.5 LLM Provider Adapter

Purpose: Forward anonymized requests to target LLM providers
Technology: Python requests library + provider-specific clients
Responsibilities:

Format anonymized content for target LLM API
Handle provider-specific authentication
Make HTTP requests to LLM provider
Return LLM response to de-anonymization engine

Supported Providers (MVP):

OpenAI (GPT-4, GPT-3.5)
- Endpoint: https://api.openai.com/v1/chat/completions
- Auth: Bearer token in header
Anthropic Claude (Claude Sonnet, Opus)
- Endpoint: https://api.anthropic.com/v1/messages
- Auth: x-api-key in header

Request Format Handling:

Parse user's original request
Extract and anonymize message content
Preserve all other parameters (model, temperature, max_tokens, etc.)
Forward to target provider with user's API key

3.6 De-anonymization Engine

Purpose: Replace tokens in LLM responses with original sensitive data
Technology: Python string processing
Responsibilities:

Parse LLM response content
Find all tokens ({{ENTITY_TYPE_INDEX}})
Replace tokens with original values from Token Mapping Store
Return de-anonymized response to user

Example:

LLM Response: "The customer {{EMAIL_1}} appears to be making a fraudulent transaction with card {{CREDIT_CARD_1}}. Recommend blocking the card."

De-anonymized: "The customer john.doe@chase.com appears to be making a fraudulent transaction with card 4532-1234-5678-9010. Recommend blocking the card."

3.7 Audit Logging Service

Purpose: Record all anonymization operations for compliance
Technology: PostgreSQL
Responsibilities:

Log every request processed
Record detected entities and tokens used
Provide audit trail for compliance teams

Log Schema:

{
  "log_id": "log_001",
  "user_id": "user_abc123",
  "session_id": "req_xyz789",
  "timestamp": "2025-11-17T10:30:00Z",
  "llm_provider": "openai",
  "model": "gpt-4",
  "detected_entities": [
    {
      "type": "EMAIL",
      "token": "{{EMAIL_1}}",
      "position": [15, 36]
    },
    {
      "type": "CREDIT_CARD", 
      "token": "{{CREDIT_CARD_1}}",
      "position": [47, 66]
    }
  ],
  "request_size": 156,
  "response_size": 423,
  "latency_ms": 1245
}

Note: Original sensitive data is NEVER stored in audit logs, only token references.

3.8 Authentication Service

Purpose: Validate user API keys and manage access
Technology: PostgreSQL + JWT (optional for future)
Responsibilities:

Generate API keys for users
Validate incoming API key headers
Rate limiting per API key
Track usage for billing

API Key Format: priv_live_[32-char-random-string]

User Schema:

{
  "user_id": "user_abc123",
  "api_key": "priv_live_k8n2m4p6q8r0s2t4u6v8w0x2y4z6",
  "api_key_hash": "sha256_hash_here",
  "created_at": "2025-11-17T10:00:00Z",
  "rate_limit": 1000,
  "requests_used": 0,
  "tier": "free"
}

4. Request Flow Diagram

Complete Request Flow

┌─────────────────────────────────────────────────────────────────────┐
│                         USER'S APPLICATION                          │
│                                                                     │
│  Original Request:                                                  │
│  POST https://api.yourservice.com/v1/proxy/openai/chat/completions │
│  Headers:                                                           │
│    - X-API-Key: priv_live_xxx (Your Service Key)                  │
│    - X-Target-API-Key: sk-xxx (User's OpenAI Key)                 │
│  Body:                                                              │
│    {                                                                │
│      "model": "gpt-4",                                             │
│      "messages": [                                                  │
│        {                                                            │
│          "role": "user",                                           │
│          "content": "Analyze transaction for                       │
│                      john.doe@chase.com with card                  │
│                      4532-1234-5678-9010"                          │
│        }                                                            │
│      ]                                                              │
│    }                                                                │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    STEP 1: API GATEWAY                              │
│                    (FastAPI Service)                                │
│                                                                     │
│  Actions:                                                           │
│  1. Receive request                                                 │
│  2. Validate X-API-Key header                                      │
│  3. Check rate limits                                              │
│  4. Generate session_id: req_xyz789                                │
│  5. Route to anonymization engine                                  │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│              STEP 2: PATTERN CONFIGURATION SERVICE                  │
│                    (PostgreSQL)                                     │
│                                                                     │
│  Actions:                                                           │
│  1. Fetch user's sensitivity patterns from database                 │
│  2. Return patterns to anonymization engine                         │
│                                                                     │
│  Retrieved Patterns:                                                │
│  - EMAIL: regex for emails                                          │
│  - CREDIT_CARD: regex for cards                                     │
│  - SSN: regex for social security numbers                           │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│              STEP 3: ANONYMIZATION ENGINE                           │
│              (Python + spaCy + Regex)                               │
│                                                                     │
│  Actions:                                                           │
│  1. Scan message content for sensitive data                         │
│  2. Apply regex patterns: find "john.doe@chase.com"                │
│  3. Apply NLP entity recognition (if applicable)                    │
│  4. Generate tokens:                                                │
│     - "john.doe@chase.com" → {{EMAIL_1}}                          │
│     - "4532-1234-5678-9010" → {{CREDIT_CARD_1}}                   │
│  5. Replace sensitive data with tokens                              │
│                                                                     │
│  Anonymized Content:                                                │
│  "Analyze transaction for {{EMAIL_1}} with card {{CREDIT_CARD_1}}" │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│              STEP 4: TOKEN MAPPING (In-Memory)                      │
│                    (Conversation-Scoped)                            │
│                                                                     │
│  Actions:                                                           │
│  1. Store token mappings with session_id as key                     │
│  2. Set TTL to 3600 seconds (1 hour)                               │
│                                                                     │
│  Stored Data:                                                       │
│  Key: "session:req_xyz789"                                         │
│  Value: {                                                           │
│    "{{EMAIL_1}}": "john.doe@chase.com",                           │
│    "{{CREDIT_CARD_1}}": "4532-1234-5678-9010"                     │
│  }                                                                  │
│  TTL: 3600 seconds                                                  │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│              STEP 5: AUDIT LOGGING SERVICE                          │
│                    (PostgreSQL)                                     │
│                                                                     │
│  Actions:                                                           │
│  1. Log the anonymization operation                                 │
│  2. Record detected entities (types + tokens only, NOT values)      │
│  3. Store timestamp, user_id, session_id                           │
│                                                                     │
│  Log Entry:                                                         │
│  {                                                                  │
│    "session_id": "req_xyz789",                                     │
│    "user_id": "user_abc123",                                       │
│    "detected_entities": [                                          │
│      {"type": "EMAIL", "token": "{{EMAIL_1}}"},                   │
│      {"type": "CREDIT_CARD", "token": "{{CREDIT_CARD_1}}"}        │
│    ],                                                               │
│    "timestamp": "2025-11-17T10:30:00Z"                             │
│  }                                                                  │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│              STEP 6: LLM PROVIDER ADAPTER                           │
│              (HTTP Client)                                          │
│                                                                     │
│  Actions:                                                           │
│  1. Format request for OpenAI API                                   │
│  2. Add user's OpenAI API key (from X-Target-API-Key header)       │
│  3. Make HTTP POST to https://api.openai.com/v1/chat/completions  │
│                                                                     │
│  Request Sent to OpenAI:                                            │
│  POST https://api.openai.com/v1/chat/completions                   │
│  Headers:                                                           │
│    Authorization: Bearer sk-xxx (user's key)                       │
│  Body:                                                              │
│    {                                                                │
│      "model": "gpt-4",                                             │
│      "messages": [                                                  │
│        {                                                            │
│          "role": "user",                                           │
│          "content": "Analyze transaction for {{EMAIL_1}}           │
│                      with card {{CREDIT_CARD_1}}"                  │
│        }                                                            │
│      ]                                                              │
│    }                                                                │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│                         OPENAI API                                  │
│                                                                     │
│  OpenAI processes anonymized request and returns:                   │
│                                                                     │
│  Response:                                                          │
│  {                                                                  │
│    "choices": [                                                     │
│      {                                                              │
│        "message": {                                                 │
│          "role": "assistant",                                      │
│          "content": "The transaction for customer {{EMAIL_1}}      │
│                      using card {{CREDIT_CARD_1}} appears to       │
│                      have high fraud risk due to..."               │
│        }                                                            │
│      }                                                              │
│    ]                                                                │
│  }                                                                  │
│                                                                     │
│  NOTE: OpenAI never sees the real email or credit card number!     │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│              STEP 7: DE-ANONYMIZATION ENGINE                        │
│              (Python String Processing)                             │
│                                                                     │
│  Actions:                                                           │
│  1. Receive LLM response                                            │
│  2. Use token mappings from request processing (in-memory)         │
│  3. Find all tokens in response: {{EMAIL_1}}, {{CREDIT_CARD_1}}   │
│  4. Replace tokens with original values                             │
│                                                                     │
│  Token Replacement:                                                 │
│  - {{EMAIL_1}} → john.doe@chase.com                               │
│  - {{CREDIT_CARD_1}} → 4532-1234-5678-9010                        │
│                                                                     │
│  De-anonymized Response:                                            │
│  "The transaction for customer john.doe@chase.com using card       │
│   4532-1234-5678-9010 appears to have high fraud risk due to..."   │
└────────────────────────────┬────────────────────────────────────────┘
                             │
                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│              STEP 8: RETURN TO USER                                 │
│                                                                     │
│  Final Response (returned to user's application):                   │
│  {                                                                  │
│    "choices": [                                                     │
│      {                                                              │
│        "message": {                                                 │
│          "role": "assistant",                                      │
│          "content": "The transaction for customer                  │
│                      john.doe@chase.com using card                 │
│                      4532-1234-5678-9010 appears to have           │
│                      high fraud risk due to..."                    │
│        }                                                            │
│      }                                                              │
│    ]                                                                │
│  }                                                                  │
│                                                                     │
│  User receives response with original sensitive data restored!      │
└─────────────────────────────────────────────────────────────────────┘

5. Key Technical Specifications

5.1 API Specifications

Proxy Endpoint Format

POST /v1/proxy/{provider}/{endpoint}

Examples:
- POST /v1/proxy/openai/chat/completions
- POST /v1/proxy/anthropic/messages

Required Headers

X-API-Key: priv_live_xxx           # Your service API key
X-Target-API-Key: sk-xxx           # User's LLM provider API key
Content-Type: application/json

Request Body

Forward the exact same request body format as the target LLM provider expects.

Response Format

Return the exact same response format as the target LLM provider, with de-anonymized content.

5.2 Anonymization Rules

Default Entity Types (Always Detected)

EMAIL: Standard email format (regex)
CREDIT_CARD: 13-19 digit card numbers with optional separators (regex)
SSN: XXX-XX-XXXX format (regex)
PHONE: Various phone number formats (regex)
PERSON: Named entities detected by spaCy NLP

Custom Pattern Configuration

Users can add domain-specific patterns via API:

POST /v1/patterns
{
  "patterns": [
    {
      "name": "internal_account_id",
      "type": "regex",
      "pattern": "ACC-[0-9]{8}",
      "enabled": true
    },
    {
      "name": "stripe_api_key",
      "type": "regex",
      "pattern": "sk_live_[A-Za-z0-9]{24}",
      "enabled": true
    }
  ]
}

5.3 Performance Requirements

Metric	Target	Notes
Latency Overhead	< 200ms	Time added by our service
Throughput	100 req/sec	Per instance (horizontal scaling)
Availability	99.9%	43 minutes downtime/month max
Token Storage TTL	1 hour	Auto-expire sensitive mappings
Max Request Size	128KB	OpenAI's limit
Detection Accuracy	> 95%	For configured patterns

5.4 Security Requirements

Data Encryption:
- All data in transit: TLS 1.3
- No persistent storage of sensitive data (stateless processing)
- PostgreSQL data encrypted at rest (for audit logs only)
API Key Security:
- User API keys stored as SHA-256 hashes
- Target LLM API keys NEVER stored (passed through only)
- Rate limiting: 1000 requests/hour per free tier key
Token Mapping Security:
- Processed in-memory per request only
- Never persisted to disk or external storage
- Discarded immediately after response sent
- Never logged in audit trails
Audit Logging:
- Log entity TYPES and TOKENS only
- NEVER log original sensitive values
- Immutable audit trail (append-only)

5.5 Error Handling

Anonymization Failures

If pattern detection fails, DO NOT forward request
Return 422 Unprocessable Entity with details

LLM Provider Errors

Forward original error from provider to user
Log error in audit trail

De-anonymization Failures

If token mapping not found (expired), return 410 Gone
Include message: "Session expired. Please retry request."

6. Data Models

User Table (PostgreSQL)

CREATE TABLE users (
  user_id UUID PRIMARY KEY,
  api_key_hash VARCHAR(64) NOT NULL UNIQUE,
  created_at TIMESTAMP NOT NULL,
  tier VARCHAR(20) DEFAULT 'free',
  rate_limit INTEGER DEFAULT 1000,
  requests_used INTEGER DEFAULT 0
);

Patterns Table (PostgreSQL)

CREATE TABLE patterns (
  pattern_id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(user_id),
  name VARCHAR(100) NOT NULL,
  type VARCHAR(20) NOT NULL,
  pattern TEXT NOT NULL,
  entity_type VARCHAR(50),
  enabled BOOLEAN DEFAULT true,
  created_at TIMESTAMP NOT NULL
);

Audit Logs Table (PostgreSQL)

CREATE TABLE audit_logs (
  log_id UUID PRIMARY KEY,
  user_id UUID REFERENCES users(user_id),
  session_id VARCHAR(50) NOT NULL,
  timestamp TIMESTAMP NOT NULL,
  llm_provider VARCHAR(50) NOT NULL,
  model VARCHAR(100) NOT NULL,
  detected_entities JSONB NOT NULL,
  request_size INTEGER,
  response_size INTEGER,
  latency_ms INTEGER
);

Token Mappings (In-Memory, Per-Request)

Scope: Single request/response cycle
Storage: In-memory dictionary during request processing
Lifecycle: Created → Used → Discarded

Example:
{
  "{{EMAIL_1}}": "john.doe@chase.com",
  "{{CREDIT_CARD_1}}": "4532-1234-5678-9010"
}

Note: Mappings are built by scanning conversation history,
      ensuring consistency within the conversation context.

7. MVP Implementation Checklist

Phase 1: Core Infrastructure (Week 1-2)

Set up FastAPI project structure
Implement API Gateway with routing
Set up PostgreSQL database with schema
Set up Redis for token mappings
Implement authentication middleware (API key validation)
Implement rate limiting

Phase 2: Anonymization (Week 2-3)

Build regex-based pattern detection
Integrate spaCy for NLP entity recognition
Implement token generation logic
Build anonymization engine (text → anonymized text)
Implement token mapping storage in Redis
Add pattern configuration API endpoints

Phase 3: LLM Integration (Week 3-4)

Build OpenAI adapter
Build Anthropic adapter
Implement request forwarding
Implement response handling
Add error handling for LLM provider failures

Phase 4: De-anonymization (Week 4-5)

Build token replacement logic
Implement token mapping retrieval from Redis
Handle edge cases (expired sessions, missing tokens)
Test with complex multi-entity responses

Phase 5: Audit & Security (Week 5-6)

Implement audit logging service
Add TLS/SSL for all endpoints
Implement secure API key hashing
Add request/response size limits
Test security measures (no sensitive data in logs)

Phase 6: Testing & Deployment (Week 6-8)

Unit tests for all components
Integration tests for full flow
Load testing (100+ req/sec)
Deploy to production (Railway/Render/Fly.io)
Set up monitoring (Sentry, logs)
Create simple onboarding API key generation

8. Success Metrics

Technical Metrics

Latency overhead: < 200ms (95th percentile)
Uptime: > 99.9%
Detection accuracy: > 95% for configured patterns
Zero instances of sensitive data leakage in logs or errors

Product Metrics (First 3 Months)

50+ developer signups
20+ companies actively using (>100 requests/week)
3-5 companies requesting paid plans
1-2 paying customers

9. Open Questions & Future Considerations

Questions to Resolve During Build

Should we support streaming responses (SSE) in MVP? → NO (post-MVP)
How to handle multi-turn conversations with consistent tokenization? → Scan full conversation history
Should we cache anonymization patterns for performance? → Load once at startup (singleton pattern)
What happens if conversation history is very large? → Reasonable limits, optimize scanning if needed

Post-MVP Features

Web dashboard for pattern configuration
Advanced analytics (most common entity types, usage patterns)
Support for more LLM providers (Cohere, AI21, etc.)
Real-time streaming response support
Organization-level accounts with team management
Embeddings and fine-tuning endpoint support
On-premises deployment option for enterprises

10. Architecture Diagram (High-Level)

┌──────────────────┐
│  User's App      │
│  (Python/JS/etc) │
└────────┬─────────┘
         │ POST /v1/proxy/openai/chat/completions
         │ Headers: X-API-Key, X-Target-API-Key
         │
         ▼
┌─────────────────────────────────────────────────────┐
│          Your Privacy Layer Service                 │
│  ┌───────────────────────────────────────────────┐  │
│  │  API Gateway (FastAPI)                        │  │
│  │  - Auth validation                            │  │
│  │  - Rate limiting                              │  │
│  │  - Request routing                            │  │
│  └──────────────────┬────────────────────────────┘  │
│                     │                                │
│                     ▼                                │
│  ┌───────────────────────────────────────────────┐  │
│  │  Pattern Configuration Service                │  │
│  │  - Fetch user's sensitivity patterns          │  │
│  │  - PostgreSQL storage                         │  │
│  └──────────────────┬────────────────────────────┘  │
│                     │                                │
│                     ▼                                │
│  ┌───────────────────────────────────────────────┐  │
│  │  Anonymization Engine                         │  │
│  │  - Regex pattern matching                     │  │
│  │  - spaCy NLP entity detection                 │  │
│  │  - Token generation                           │  │
│  │  - Text replacement                           │  │
│  └──────────────────┬────────────────────────────┘  │
│                     │                                │
│         ┌───────────┴───────────┐                   │
│         ▼                       ▼                    │
│  ┌─────────────┐         ┌─────────────────┐        │
│  │Token Store  │         │ Audit Logger    │        │
│  │(Redis)      │         │ (PostgreSQL)    │        │
│  │- TTL: 1hr   │         │- Entity types   │        │
│  │- Mappings   │         │- Timestamps     │        │
│  └─────────────┘         └─────────────────┘        │
│         │                                            │
│         ▼                                            │
│  ┌───────────────────────────────────────────────┐  │
│  │  LLM Provider Adapter                         │  │
│  │  - OpenAI client                              │  │
│  │  - Anthropic client                           │  │
│  │  - Request forwarding                         │  │
│  └──────────────────┬────────────────────────────┘  │
└────────────────────┼────────────────────────────────┘
                     │ Anonymized Request
                     │
                     ▼
         ┌──────────────────────┐
         │   LLM Provider       │
         │   (OpenAI/Claude)    │
         │                      │
         │   Receives:          │
         │   "Customer          │
         │   {{EMAIL_1}} with   │
         │   card               │
         │   {{CREDIT_CARD_1}}" │
         └──────────┬───────────┘
                    │ LLM Response
                    │
                    ▼
┌────────────────────────────────────────────────────┐
│          Your Privacy Layer Service                │
│  ┌───────────────────────────────────────────────┐ │
│  │  De-anonymization Engine                      │ │
│  │  - Fetch token mappings from Redis            │ │
│  │  - Replace tokens with original values        │ │
│  │  - Return complete response                   │ │
│  └──────────────────┬────────────────────────────┘ │
└────────────────────┼────────────────────────────────┘
                     │ De-anonymized Response
                     │
                     ▼
         ┌──────────────────────┐
         │   User's App         │
         │                      │
         │   Receives:          │
         │   "Customer          │
         │   john.doe@chase.com │
         │   with card          │
         │   4532-1234-5678-    │
         │   9010..."           │
         └──────────────────────┘

11. Code Structure

Recommended Project Structure

privacy-llm-proxy/
├── app/
│   ├── __init__.py
│   ├── main.py                    # FastAPI app entry point
│   ├── config.py                  # Configuration management
│   ├── dependencies.py            # Dependency injection
│   │
│   ├── api/                       # API endpoints
│   │   ├── __init__.py
│   │   ├── proxy.py              # Proxy endpoints
│   │   ├── patterns.py           # Pattern configuration endpoints
│   │   └── audit.py              # Audit log endpoints
│   │
│   ├── core/                      # Core business logic
│   │   ├── __init__.py
│   │   ├── anonymization.py      # Anonymization engine
│   │   ├── deanonymization.py    # De-anonymization engine
│   │   ├── token_manager.py      # Token generation and mapping
│   │   └── pattern_matcher.py    # Pattern detection logic
│   │
│   ├── adapters/                  # LLM provider adapters
│   │   ├── __init__.py
│   │   ├── base.py               # Abstract base adapter
│   │   ├── openai_adapter.py     # OpenAI integration
│   │   └── anthropic_adapter.py  # Anthropic integration
│   │
│   ├── models/                    # Data models
│   │   ├── __init__.py
│   │   ├── user.py               # User model
│   │   ├── pattern.py            # Pattern model
│   │   ├── audit_log.py          # Audit log model
│   │   └── request.py            # Request/response models
│   │
│   ├── db/                        # Database
│   │   ├── __init__.py
│   │   ├── postgres.py           # PostgreSQL connection
│   │   └── redis.py              # Redis connection
│   │
│   ├── services/                  # Service layer
│   │   ├── __init__.py
│   │   ├── auth_service.py       # Authentication
│   │   ├── pattern_service.py    # Pattern CRUD
│   │   └── audit_service.py      # Audit logging
│   │
│   └── utils/                     # Utilities
│       ├── __init__.py
│       ├── validators.py         # Input validation
│       └── security.py           # Security utilities
│
├── tests/                         # Test suite
│   ├── test_anonymization.py
│   ├── test_proxy.py
│   └── test_patterns.py
│
├── alembic/                       # Database migrations
│   └── versions/
│
├── requirements.txt               # Python dependencies
├── Dockerfile                     # Container definition
├── docker-compose.yml            # Local development setup
└── README.md                      # Setup instructions

12. Key Dependencies

Python Packages

fastapi==0.104.1              # Web framework
uvicorn==0.24.0               # ASGI server
pydantic==2.5.0               # Data validation
sqlalchemy==2.0.23            # ORM for PostgreSQL
psycopg2-binary==2.9.9        # PostgreSQL driver
spacy==3.7.2                  # NLP for entity recognition
httpx==0.25.2                 # HTTP client for LLM APIs
python-jose[cryptography]     # JWT handling
passlib[bcrypt]               # Password hashing
alembic==1.13.0               # Database migrations
python-dotenv==1.0.0          # Environment variables
sentry-sdk==1.38.0            # Error tracking
prometheus-client==0.19.0     # Metrics

System Requirements

Python 3.11+
PostgreSQL 15+ (optional, for audit logs)
2GB RAM minimum
10GB disk space

13. Environment Variables

Required Configuration

# Service Configuration
SERVICE_NAME=privacy-llm-proxy
ENVIRONMENT=production
PORT=8000
LOG_LEVEL=INFO

# Database
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_DB=privacy_proxy
POSTGRES_USER=proxy_user
POSTGRES_PASSWORD=secure_password

# No Redis needed for MVP - stateless processing

# Security
API_KEY_SALT=random_salt_string
JWT_SECRET=jwt_secret_key
ENCRYPTION_KEY=encryption_key_for_data

# Rate Limiting
FREE_TIER_RATE_LIMIT=1000
PAID_TIER_RATE_LIMIT=10000

# Monitoring
SENTRY_DSN=https://sentry.io/your-project
PROMETHEUS_PORT=9090

# Feature Flags
ENABLE_NLP_DETECTION=true
ENABLE_AUDIT_LOGS=true

14. API Usage Examples

Example 1: Configure Patterns

curl -X POST https://api.yourservice.com/v1/patterns \
  -H "X-API-Key: priv_live_k8n2m4p6q8r0s2t4u6v8w0x2y4z6" \
  -H "Content-Type: application/json" \
  -d '{
    "patterns": [
      {
        "name": "credit_card",
        "type": "regex",
        "pattern": "\\b\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}[-\\s]?\\d{4}\\b",
        "enabled": true
      },
      {
        "name": "ssn",
        "type": "regex",
        "pattern": "\\b\\d{3}-\\d{2}-\\d{4}\\b",
        "enabled": true
      },
      {
        "name": "email",
        "type": "entity",
        "entity_type": "EMAIL",
        "enabled": true
      }
    ]
  }'

Example 2: Make Proxied Request to OpenAI

curl -X POST https://api.yourservice.com/v1/proxy/openai/chat/completions \
  -H "X-API-Key: priv_live_k8n2m4p6q8r0s2t4u6v8w0x2y4z6" \
  -H "X-Target-API-Key: sk-proj-xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "user",
        "content": "Analyze this transaction: Customer john.doe@chase.com used card 4532-1234-5678-9010 for $5,000 purchase. Is this fraudulent?"
      }
    ],
    "temperature": 0.7
  }'

What happens internally:

Your service detects john.doe@chase.com and 4532-1234-5678-9010
Replaces with {{EMAIL_1}} and {{CREDIT_CARD_1}}
Sends anonymized request to OpenAI
OpenAI responds with tokens in the text
Your service replaces tokens back to original values
Returns de-anonymized response to user

Example 3: Retrieve Audit Logs

curl -X GET "https://api.yourservice.com/v1/audit-logs?limit=10" \
  -H "X-API-Key: priv_live_k8n2m4p6q8r0s2t4u6v8w0x2y4z6"

Response:

{
  "logs": [
    {
      "log_id": "log_001",
      "timestamp": "2025-11-17T10:30:00Z",
      "llm_provider": "openai",
      "model": "gpt-4",
      "detected_entities": [
        {"type": "EMAIL", "token": "{{EMAIL_1}}", "count": 1},
        {"type": "CREDIT_CARD", "token": "{{CREDIT_CARD_1}}", "count": 1}
      ],
      "latency_ms": 1245
    }
  ],
  "total": 147,
  "page": 1
}

15. Testing Strategy

Unit Tests

Test each component in isolation
Mock external dependencies (PostgreSQL, LLM APIs)
Focus on core logic: anonymization, de-anonymization, token generation

Integration Tests

Test full request flow end-to-end
Use test databases (PostgreSQL for audit logs if needed)
Verify data flows correctly between components

Test Cases to Cover

Anonymization Engine:

✅ Detects single credit card number
✅ Detects multiple credit card numbers
✅ Detects emails in various formats
✅ Detects custom patterns (user-defined)
✅ Handles edge cases (no PII in text)
✅ Preserves non-sensitive context

Token Management:

✅ Generates unique tokens per entity
✅ Maintains consistent tokens within conversations
✅ Scans conversation history for token reuse
✅ Processes requests independently (stateless)

De-anonymization:

✅ Replaces single token
✅ Replaces multiple tokens
✅ Handles missing tokens (expired session)
✅ Preserves response structure

API Gateway:

✅ Validates API keys
✅ Rate limits requests
✅ Routes to correct provider
✅ Returns appropriate error codes

Load Testing

Use tools like Locust or k6
Simulate 100+ concurrent requests
Verify latency stays under 200ms
Check for memory leaks

16. Deployment Checklist

Pre-Production

Production Deployment

Deploy to cloud platform (Railway/Render/Fly.io)
Configure SSL/TLS certificates
Set up database backups (daily)
Verify horizontal scalability (stateless design)
Set up log aggregation (CloudWatch/Datadog)
Configure error tracking (Sentry)
Set up uptime monitoring (UptimeRobot)
Create runbook for incidents
Perform smoke tests post-deployment

Post-Launch

17. Risks & Mitigations

Technical Risks

Risk 1: Anonymization Accuracy < 95%

Impact: Sensitive data leaks to LLM providers
Mitigation:
- Extensive pattern testing before launch
- Allow users to test patterns in sandbox mode
- Add fallback: if unsure, block request rather than risk leak

Risk 2: High Latency (>200ms overhead)

Impact: Poor user experience, adoption suffers
Mitigation:
- Optimize regex matching (compile patterns once)
- Load patterns at startup (singleton)
- Profile code to find bottlenecks
- Consider async processing where possible

Risk 3: Large Conversation History

Impact: Slower processing for very long conversations
Mitigation:
- Set reasonable conversation length limits
- Optimize scanning algorithm
- Cache pattern compilation
- Consider truncating very old messages

Risk 4: Token Collision

Impact: Wrong data mapped to wrong token
Mitigation:
- Use session-scoped tokens (req_xyz789 prefix)
- Include timestamp in token generation
- Validate token uniqueness before storing

Business Risks

Risk 1: Users Don't Trust Us With Data

Mitigation:
- Get SOC 2 Type II certification quickly
- Open-source the anonymization logic
- Provide on-premises deployment option
- Transparent audit logs

Risk 2: LLM Providers Update APIs

Mitigation:
- Version our adapter layer
- Monitor provider changelogs
- Build adapter tests that catch breaking changes

18. Success Definition

MVP is Successful If:

Technical: 95%+ anonymization accuracy, <200ms latency
Adoption: 50+ developer signups in first month
Usage: 20+ companies actively using (>100 requests/week)
Revenue: 3-5 companies requesting paid plans
Feedback: Positive feedback on ease of integration

MVP Has Failed If:

<10 signups after launch
Users sign up but don't integrate (integration too hard)
High error rates (>5%)
No one willing to pay

Appendix A: Glossary

Anonymization: Process of detecting and replacing sensitive data with tokens

De-anonymization: Process of replacing tokens back with original sensitive data

Context-preserving: Anonymization that maintains semantic meaning for the LLM

Token: Placeholder text like {{EMAIL_1}} that represents sensitive data

Session ID: Unique identifier for each request, used to scope token mappings

Pattern: User-defined regex or entity type that defines what data is sensitive

Audit Log: Immutable record of anonymization operations (entity types only)

Token Mapping: Key-value store connecting tokens to original sensitive values

Proxy: Service that forwards requests to another service (in this case, LLM APIs)

Stateless: System doesn't maintain state between requests; each request is processed independently

Appendix B: References

Technical Resources

spaCy Documentation: https://spacy.io/
FastAPI Documentation: https://fastapi.tiangolo.com/
OpenAI API Reference: https://platform.openai.com/docs/api-reference
Anthropic API Reference: https://docs.anthropic.com/

Compliance Resources

GDPR Guidelines: https://gdpr.eu/
SOC 2 Compliance: https://www.aicpa.org/soc
HIPAA Technical Safeguards: https://www.hhs.gov/hipaa/

END OF DOCUMENT