Contentful Architecture Overview

System Architecture

Contentful is a distributed video generation pipeline built with a microservices architecture. The system transforms text topics into fully-rendered videos through a series of AI-powered stages.

graph TB
    CLI[CLI/API Client] --> ORCH[Orchestrator Service]
    ORCH --> DB[(MongoDB)]
    ORCH --> CACHE[(Redis)]
    
    ORCH --> ING[Ingestion Stage]
    ING --> SCRIPT[Scripting Stage]
    SCRIPT --> ASSET[Asset Gathering]
    ASSET --> VOICE[Voice Generation]
    VOICE --> TIME[Timeline Building]
    TIME --> REND[Renderer Service]
    
    ING --> WIKI[Wikipedia API]
    ING --> WEB[Web Scraper]
    SCRIPT --> LLM[OpenAI GPT-4]
    ASSET --> PEXELS[Pexels API]
    ASSET --> DALLE[DALL-E 3]
    VOICE --> TTS[ElevenLabs]
    VOICE --> ASR[Whisper]
    REND --> MP4[Video Output]

Core Components

1. Orchestrator Service

Purpose: Coordinates the entire pipeline, manages job lifecycle
Technology: Python, FastAPI, AsyncIO
Port: 8000
Responsibilities:
- Job queue management
- Pipeline stage coordination
- Provider initialization
- Error recovery and retries
- Progress tracking

2. Renderer Service

Purpose: Composes final video from timeline
Technology: Python, MoviePy, FastAPI
Port: 8001
Responsibilities:
- Timeline to video conversion
- Scene composition
- Transition effects
- Audio mixing
- Subtitle embedding

3. MongoDB Database

Purpose: Persistent storage for jobs and data
Collections:
- jobs: Job metadata and status
- storyboards: Generated scripts
- timelines: Video timelines
- assets: Media asset references
Indexes: status, created_at, updated_at

4. Redis Cache

Purpose: Caching and temporary data
Use Cases:
- API response caching
- Media asset URLs
- Provider rate limiting
- Session management

Pipeline Stages

Stage 1: Ingestion

Fetches and processes source content.

Input: Topic + Source (Wikipedia/Web) Output: Research bundle (2000+ words)

Process:

Search for relevant content
Extract and clean text
Gather citations and metadata
Validate word count minimum

Stage 2: Scripting

Generates video script using LLM.

Input: Research bundle + Template Output: Storyboard with beats

Process:

Create template-specific prompt
Generate structured storyboard
Validate beat structure
Add visual guidance

Stage 3: Asset Gathering

Collects media assets for visuals.

Input: Storyboard beats Output: Downloaded media files

Process:

Search for relevant images/videos
Score relevance with CLIP
Download highest scoring assets
Track attribution

Stage 4: Voice Generation

Creates narration audio.

Input: Beat narration text Output: Audio files + timestamps

Process:

Synthesize speech with TTS
Generate word-level timestamps
Create subtitle files
Calculate durations

Stage 5: Timeline Building

Constructs video timeline.

Input: Assets + Audio + Storyboard Output: Timeline JSON

Process:

Align audio with visuals
Apply Ken Burns effects
Add text overlays
Set transitions

Stage 6: Rendering

Produces final video file.

Input: Timeline Output: MP4/WebM video

Process:

Load media assets
Apply effects and transitions
Mix audio tracks
Encode video

Data Flow

1. User Request:
   Topic: "History of AI"
   Template: Documentary
   Duration: 90 seconds

2. Research Bundle:
   Word Count: 3500
   Sources: 5 Wikipedia articles
   Images: 15 references
   
3. Storyboard:
   Beats: 6 (intro, 4 body, outro)
   Total Narration: 850 words
   Visual Cues: 18 search queries

4. Assets:
   Images: 12 downloaded
   Videos: 3 clips
   Music: 1 background track
   
5. Voice Files:
   Narration: 6 MP3 files
   Duration: 88 seconds total
   Subtitles: 6 SRT files

6. Timeline:
   Scenes: 6
   Transitions: 5 fade
   Total Duration: 90 seconds
   
7. Output:
   Format: MP4
   Resolution: 1920x1080
   Size: ~45MB

Provider Architecture

Provider Interface

All providers implement a common interface:

class Provider:
    name: str
    async def initialize()
    async def process()
    async def cleanup()

Provider Types

LLM Providers:

OpenAI (GPT-4, GPT-4-Vision)
Claude (future)
Local LLMs (future)

TTS Providers:

ElevenLabs
OpenAI TTS (future)
Azure Speech (future)

Media Providers:

Pexels
Unsplash (future)
Pixabay (future)

ASR Providers:

Whisper (local)
Whisper API (future)

Image Generation:

DALL-E 3
Midjourney (future)
Stable Diffusion (future)

Scalability Design

Horizontal Scaling

Stateless services enable horizontal scaling
Load balancer distributes requests
MongoDB replica sets for data redundancy
Redis cluster for cache distribution

Vertical Scaling

Async processing maximizes CPU utilization
Memory-efficient streaming for large files
GPU acceleration for rendering (optional)

Performance Optimizations

Concurrent pipeline stage execution
Asset download parallelization
Caching at multiple levels
Connection pooling for databases

Error Handling

Retry Strategy

Retryable Errors:
  - Network timeouts
  - Rate limits
  - Temporary API failures
  
Retry Policy:
  - Max Attempts: 3
  - Backoff: Exponential
  - Max Delay: 30 seconds

Failure Recovery

Job state persistence enables resumption
Partial progress saved at each stage
Failed jobs can be manually retried
Automatic cleanup of orphaned resources

Security Architecture

API Security

Rate limiting per IP/API key
Input validation and sanitization
SQL/NoSQL injection prevention
XSS protection

Data Security

API keys stored in environment variables
Sensitive data masked in logs
Secure file upload validation
Path traversal prevention

Network Security

HTTPS enforcement
CORS configuration
Request timeout limits
DDoS protection (cloudflare)

Monitoring & Observability

Health Checks

/health endpoints on all services
Database connection monitoring
Redis availability checks
Disk space monitoring

Metrics

Job completion rate
Average processing time
Error rates by stage
Resource utilization

Logging

Structured JSON logging
Log levels: DEBUG, INFO, WARN, ERROR
Centralized log aggregation
Error tracking with context

Deployment Architecture

Docker Compose (Development)

Services:
  - orchestrator: Port 8000
  - renderer: Port 8001
  - mongodb: Port 27017
  - redis: Port 6379
  
Networks:
  - contentful_network
  
Volumes:
  - mongodb_data
  - redis_data
  - media_storage

Kubernetes (Production)

Deployments:
  - orchestrator (3 replicas)
  - renderer (2 replicas)
  
StatefulSets:
  - mongodb (3 replicas)
  - redis (3 replicas)
  
Services:
  - LoadBalancer for API
  - ClusterIP for internal
  
Storage:
  - PersistentVolumes for data
  - Object storage for media

Technology Stack

Backend

Language: Python 3.11+
Frameworks: FastAPI, Pydantic
Async: AsyncIO, aiohttp, aiofiles
Video: MoviePy, FFmpeg
Database: Motor (MongoDB), redis-py

Infrastructure

Containers: Docker, Docker Compose
Orchestration: Kubernetes (production)
CI/CD: GitHub Actions
Monitoring: Prometheus, Grafana

AI/ML

LLM: OpenAI GPT-4
TTS: ElevenLabs
ASR: Whisper
Vision: CLIP, GPT-4-Vision
Generation: DALL-E 3

Design Patterns

Architectural Patterns

Microservices: Loosely coupled services
Pipeline: Sequential processing stages
Repository: Data access abstraction
Factory: Provider instantiation

Code Patterns

Dependency Injection: Provider configuration
Strategy: Swappable providers
Observer: Progress notifications
Circuit Breaker: API failure handling

Future Architecture Considerations

Planned Enhancements

Message Queue: RabbitMQ/Kafka for job queue
Workflow Engine: Temporal/Airflow for complex pipelines
CDN: CloudFront for media delivery
ML Pipeline: Kubeflow for model serving
Multi-region: Geographic distribution

Scalability Roadmap

Phase 1: Current - Single region, Docker Compose
Phase 2: Kubernetes, horizontal scaling
Phase 3: Multi-region, CDN integration
Phase 4: Edge computing, global distribution

Contentful Architecture Overview

Contentful Architecture Overview

System Architecture

Core Components

1. Orchestrator Service

2. Renderer Service

3. MongoDB Database

4. Redis Cache

Pipeline Stages

Stage 1: Ingestion

Stage 2: Scripting

Stage 3: Asset Gathering

Stage 4: Voice Generation

Stage 5: Timeline Building

Stage 6: Rendering

Data Flow

Provider Architecture

Provider Interface

Provider Types

Scalability Design

Horizontal Scaling

Vertical Scaling

Performance Optimizations

Error Handling

Retry Strategy

Failure Recovery

Security Architecture

API Security

Data Security

Network Security

Monitoring & Observability

Health Checks

Metrics

Logging

Deployment Architecture

Docker Compose (Development)

Kubernetes (Production)

Technology Stack

Backend

Infrastructure

AI/ML

Design Patterns

Architectural Patterns

Code Patterns

Future Architecture Considerations

Planned Enhancements

Scalability Roadmap

Related Documents

Design Document: BharatSeva AI

OpenClaw Enterprise Transformation Plan

Qwen Image and Edit: Open-sourcing and Local GGUF Generations with Lightning

Qwen3-TTS — Model Reference