replit SPEC MVP - Community History Platform

A full-stack community history book platform that transforms volunteer uploads into AI-drafted chapters for community review. Real workflow: audio upload → Whisper transcription → GPT-5 synthesis → community review.

gerpacuity

May 2, 2026

0 downloads

0 views

ai rag workflow

View source

replit SPEC MVP - Community History Platform

Project Overview

Architecture Decisions

Stack: React frontend + Express.js backend (FULLSTACK_JS template)
Database: Local SQLite with Drizzle ORM for development
AI Services: Gladia transcription API (no file size limitations)
Authentication: Simplified email-based auth for development
File Storage: Replit Object Storage with signed URLs
Workflow: Separated transcription from chapter synthesis
- Interview Processing: uploaded → transcribing → transcribed (final state)
- Chapter Creation: Manual process using multiple transcribed interviews

Technical Specifications

File Processing: Direct upload to Gladia (no size limitations, no chunking needed)
Audio Processing: Gladia handles all audio formats and processing automatically
Mobile-First: ≤3 screens design, Slow 3G performance target (≤15s)
Rights Management: Standard consent checkbox + contributor attribution
Simple Status Flow: uploaded → transcribing → transcribed (final state for interviews)

Core Modules

Volunteer Portal: File upload + browser recording, progress tracking
Transcription Service: Whisper API for clean interview transcripts
Interview Management: Track and organize transcribed interviews by topics
Chapter Synthesis: Manual process to create chapters from multiple interviews (future)
Status Dashboard: Real-time progress indicators for all submissions

User Preferences

Real API integration required (no mock data)
Direct-to-storage uploads via Supabase signed URLs
Adapter pattern for provider switching
Comprehensive logging for debugging

Build Status

Started: December 2024
Current Phase: Architecture complete, debugging audio processing pipeline
Note: Large file transcription needs optimization to prevent API cost overruns

Recent Changes

Project initialized with technical specifications (December 2024)
Architecture decisions finalized based on detailed requirements
Increased file size limit from 20MB to 100MB
Implemented audio chunking pipeline with FFmpeg for large file processing
Added AudioProcessor service for intelligent audio segmentation (5min chunks)
Updated Whisper integration to handle chunked processing and recombination
Major Architecture Change: Separated transcription from chapter synthesis (August 2025)
- Interviews now stop at 'transcribed' status - ready for manual chapter creation
- Multiple interviews will be used per chapter, avoiding 1:1 mapping
- Enables better editorial control over narrative structure
Status Flow Cleanup: Removed obsolete statuses (August 2025)
- Eliminated 'drafting', 'pending_review', 'approved', 'rejected' from UI
- Simplified to core workflow: uploaded → transcribing → transcribed
- Fixed chunking validation to prevent empty file hangs
Transcription Service Migration: Switched from OpenAI Whisper to Gladia (August 2025)
- Eliminated complex chunking system (no file size limits with Gladia)
- Direct audio upload and processing without intermediate files
- Simplified architecture with single transcription service
UI/UX Redesign: Complete interface overhaul with modern UX patterns (August 2025)
- 3-step guided upload workflow with progress tracking
- Dynamic metadata collection for photos and documents
- Dashboard-style status page with filtering and search
- Improved navigation with header-based menu and mobile-responsive design
- Automatic transcription trigger on upload for seamless workflow
Transcript Management: Added comprehensive viewing and export capabilities (August 2025)
- Enhanced transcript viewer with search, highlighting, and word count
- Multiple export formats (TXT, JSON) with proper formatting
- Copy-to-clipboard functionality for easy sharing
- Inline editing capabilities for transcript corrections
- Rich metadata display with reading time and contributor information
Book Project System: Implemented project-based story organization (August 2025)
- Added book projects with title, description, and status tracking
- Story segregation by different community history collections
- Enhanced navigation with dedicated projects section
- Project statistics and management interface
Admin-Protected Status Dashboard: Enhanced security and moderation (August 2025)
- Password-protected access to status/admin dashboard (password: hero-admin-2024)
- Upload functionality remains publicly accessible for volunteers
- Added submission moderation with approve/reject/flag capabilities
- Comprehensive moderation notes and audit trail
- Admin session management with automatic expiration
AI-Powered Attribution System: Automatic metadata detection for documents and images (August 2025)
- GPT-4o vision analysis for photographs, documents, and historical images
- Automatic detection of potential authors, dates, locations, and sources
- Text extraction from documents and images with OCR capabilities
- Historical context analysis and time period estimation
- Entity extraction (people, places, organizations, dates)
- Confidence scoring for attribution accuracy assessment
- Human-readable attribution summaries for quick review
- Detailed metadata display with expandable information panels
Navigation Simplification: Streamlined project-centric workflow (August 2025)
- Made Projects page the default landing page for better project-first workflow
- REMOVED redundant project detail page - eliminated unnecessary navigation layer
- Direct actions from project cards: "Upload Content" and "View Stories" buttons
- Fixed circular navigation loops by removing intermediate detail page
- Increased file upload limits to 10 files and 100MB per file
- Project-specific submission filtering and context-aware navigation
Critical Text Content Bug Fix: Fixed validation schema dropping transcript data (September 2025)
- Issue: insertSubmissionSchema was omitting transcript field, causing all pasted text content to be lost during backend validation
- Impact: YouTube transcripts, pasted text, and extracted content appeared to submit but were silently dropped
- Resolution: Removed transcript field from validation schema omit list to preserve all text content
- Result: Text-only submissions now save correctly with full content preserved
- Improved debugging infrastructure for submission pipeline troubleshooting
Enhanced Chapter Refinement System: Added iterative AI collaboration capabilities (September 2025)
- Implemented chapter deletion functionality with confirmation dialogs
- Added "Refine" button alongside existing Edit/Preview/Regenerate options
- Enhanced Claude integration to include full original sources during refinement
- Deep Context Refinement: Claude receives current chapter + all original transcripts/URLs
- Increased token limits to 16,000 for handling 5,000+ word chapter refinements
- Enables iterative improvements vs wholesale regeneration for collaborative editing workflow
Structured Chapter Generation Parameters: Professional chapter customization interface (September 2025)
- Added structured form fields replacing free-text prompts for better UX and reliability
- Length Control: Precise word count targets (Short 1-2k, Medium 3-4k, Long 5-7k, Very Long 8k+)
- Style Selection: Narrative, Conversational, Formal, Journalistic, Evocative writing styles
- Tone Control: Celebratory, Nostalgic, Dramatic, Factual, Emotional, Humorous tones
- Built comprehensive prompt generation system mapping parameters to detailed Claude instructions
- Enhanced backend API to accept structured parameters vs parsing free text
- Improved chapter consistency and predictability through systematic parameter control
Jotform Integration: External form collection channel with manual import workflow (October 2025)
- Architecture Pivot: Changed from automatic webhooks to manual import for better editorial control
- Centralized API Key: Platform admin manages ONE Jotform account with API key in JOTFORM_API_KEY secret
- Form ID Assignment: Each club has unique jotform_form_id field linking to their specific form
- Manual Import UI: "Import from Jotform" button in Collections Manager with preview/selection dialog
- Smart Field Detection: Pattern-based field identification works across different Jotform forms without hard-coded field IDs
- File Processing: Downloads files from Jotform S3, uploads to Replit Object Storage, triggers transcription for audio
- Database Schema: Added jotform_form_id, source, jotform_submission_id, jotform_metadata, processing_status, error_message fields
- Import Flow: Preview submissions → select which to import → async file download/upload → normal transcription pipeline
- Duplicate Prevention: Tracks imported submissions via jotform_submission_id to prevent re-importing
- System Settings UI: Displays Form ID configuration, last sync timestamp, removed webhook URLs from interface
- Backend Webhooks: Webhook code remains dormant as fallback, fully functional but not exposed in UI
Database Export Tool: Comprehensive export utility for backing up all content without Excel character limits (October 2025)
- Solves Excel 32K Problem: Long transcripts were truncated when exported to XLS/XLSX format
- Complete Export: Downloads database records, full text content, and all files from Object Storage
- Organized Output: Separate folders for transcripts, chapters, and media files (audio/images/documents)
- No Truncation: Individual text files preserve complete content regardless of length
- Production Support: Can run against both development and production databases via DATABASE_URL
- File Download: Fetches all uploaded media from Object Storage and organizes by type
- Multiple Formats: JSON for structured data, CSV manifest for spreadsheets, TXT files for content
- Usage: npx tsx server/export-database.ts with environment variables for configuration
- Documentation: Complete instructions in EXPORT_INSTRUCTIONS.md with examples and troubleshooting

replit SPEC MVP - Community History Platform

replit SPEC MVP - Community History Platform

Project Overview

Architecture Decisions

Technical Specifications

Core Modules

User Preferences

Build Status

Recent Changes

Related Documents

GOOBY - Closing Store Marketplace

Lua Code Obfuscator/Deobfuscator

GREEN NODE - Project Overview

ADN Systems DMR Peer Server