ScholarAI: multi-agent ai deep researcher built using langgraph and gemini-2.5 models
# 🎓 ScholarAI: Your Personal AI Research & Podcast Studio <p align="center"> <img src="https://img.shields.io/badge/Python-3.11+-blue.svg" alt="Python Version"> <img src="https://img.shields.io/badge/Framework-FastAPI-green" alt="FastAPI"> <img src="https://img.shields.io/badge/Messaging-Kafka-black" alt="Kafka"> <img src="https://img.shields.io/badge/Orchestration-LangGraph-orange" alt="LangGraph"> <img src="https://img.shields.io/badge/License-MIT-lightgrey" alt="License"> </p>  Effortlessly transform any research topic into a comprehensive, cited report and a studio-quality podcast. ScholarAI leverages a sophisticated, event-driven backend and multi-agent workflows powered by **LangGraph** and **Google Gemini 2.5 Pro** to automate the entire research-to-content pipeline. This isn't just a script; it's a production-grade, asynchronous system designed for scalability, resilience, and real-time user feedback. ## ✨ Key Features * **⚡ Instantaneous API Response**: The API uses an event-driven architecture with Kafka to accept jobs instantly, providing a non-blocking user experience. * **🧠 Multi-Modal Intelligence**: Goes beyond text by analyzing **YouTube videos** to extract deep insights, enriching the final research report. * **🌐 Real-time Web Search**: Integrates Gemini's native **Google Search tool** to ground its research in up-to-the-minute, real-world data with citations. * **🎙️ Multi-Speaker Podcast Generation**: Creates engaging, conversational podcasts with distinct speaker voices using Gemini's advanced TTS capabilities. * **🔴 Live Job Tracking**: Users receive **real-time status updates** pushed from the server via WebSockets, from `PENDING` to `COMPLETED`. * **🔒 Secure, On-Demand Artifacts**: Generated reports and podcasts are stored securely in the cloud and accessed via temporary, pre-signed URLs, ensuring only the owner can downloa
Google's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.
Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.
Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.
Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.
Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.
Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.