Google Gemini Agentic AI Vision for Wildlife Analytics
# Wildlife Behavioral Risk Vision Agent - Google Gemini 3 Hackathon
### Agentic Vision System using Google Gemini 3
An ethical, stateful, dual-agent wildlife monitoring system that detects **persistent behavioral deviations relative to group norms**, accumulates confidence over time, and visualizes at-risk animals using AI-seeded OpenCV tracking.
Built for conservation monitoring demos and the **Google Gemini 3 Hackathon**.
## Demo
<p align="center"> <img src="docs_assets/vision_demo.gif" width="400"> </p>
<p align="center"> <img src="docs_assets/sample.png" width="48%"> <img src="docs_assets/vision_sample.jpg" width="48%"> </p>
## Problem
Wildlife conservation teams often rely on manual observation to detect:
* Injured animals
* Social exclusion
* Persistent behavioral anomalies
* Abnormal movement patterns
This process is:
* Time consuming
* Subjective
* Not scalable
We built a **stateful agentic vision system** that:
* Observes animals over time
* Learns group behavioral baselines
* Detects persistent deviations
* Escalates risk without making medical claims
## System Overview
This project contains **two cooperating AI agents**:
### 1️⃣ Behavioral Reasoning Agent (Gemini 3 Pro)
Analyzes raw wildlife video and extracts structured behavioral signals:
* Stable `individual_id`
* Relative speed (0–1)
* Posture asymmetry (0–1)
* Distance from group center
* Group baseline averages
* Confidence accumulation
* At-risk seed (time window + normalized coordinates)
It outputs structured JSON memory for downstream reasoning.
### 2️⃣ Visualization Agent (Gemini 3 Pro)
Generates a complete standalone Python script that:
* Uses memory data as a **tracking seed**
* Seeks directly to the seed timestamp
* Converts normalized coordinates → pixels
* Initializes an OpenCV CSRT tracker
* Draws red bounding boxes for at-risk animals
* Saves annotated video output
* Captures sample frames
## Workflow
```
Wildlife Video
│
▼
Gemini 3 Vision Agent
(BehGoogle's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.
Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.
Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.
Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.
Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.
Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.