AI Video Analyzer & Chat Agent is a robust AI application built with Streamlit, Agno, & Langchain's DuckDuckGo Tool. Integrating Gemini 1.5 Flash, it enables video analysis, insight extraction, and AI-powered chat with features like content analysis, real-time web searches, and multi-modal analysis for research, education, and interactive learning.
# AI Video Analyzer & Chat Agent This project is an advanced **AI Video Analyzer and Chat Agent** built using **Streamlit**, powered by **Google's Gemini 1.5 Flash** and **LangChain's DuckDuckGo integration**. It provides an interactive platform for users to analyze videos, get AI-powered insights, and perform web searches all in one interface. ## Table of Contents - [Project Overview](#project-overview) - [Features](#features) - [Uses and Scope](#uses-and-scope) - [File Structure](#file-structure) - [Software and Tools Requirements](#software-and-tools-requirements) - [Getting Started](#getting-started) - [Data Description](#data-description) - [Usage](#usage) - [Future Enhancements](#future-enhancements) - [Acknowledgments](#acknowledgments) ## Project Overview The **AI Video Analyzer & Chat Agent** is a powerful web application that combines video analysis capabilities with natural language processing and web search functionality. It uses Agno's Agent framework to integrate Google's Gemini 1.5 Flash model for video understanding and LangChain's DuckDuckGoSearchRun tool for supplementary web searches, providing users with comprehensive insights and information about their uploaded videos. ## Features - **AI Agent Architecture**: Built using Agno's Agent framework for seamless AI integration - **Video Upload & Processing**: Support for multiple video formats (MP4, MOV, AVI, MKV) - **AI-Powered Analysis**: Video content analysis using Gemini 1.5 Flash - **Interactive Chat Interface**: Real-time conversation with the AI about video content - **LangChain Tools Integration**: Web search functionality using LangChain's DuckDuckGoSearchRun tool - **Session Management**: Automatic timeout after 1 hour of inactivity - **Responsive UI**: Clean and intuitive user interface with auto-scrolling chat - **Multi-Modal Analysis**: Combines video understanding with text-based responses - **Temporary File Handling**: Secure processing of uploaded videos ## Uses and Scope Th
Google's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.
Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.
Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.
Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.
Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.
Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.