WhatsApp bot powered by Google Gemini 3 Flash Preview and the Koog AI agent framework.
<div align="center">
<img src="assets/banner.png" alt="WhatsApp AI Bot Banner" width="800">
# WhatsApp AI Bot
[](https://kotlinlang.org/)
[](https://ktor.io/)
[](https://aistudio.google.com/)
[](LICENSE)
**A production-ready WhatsApp bot powered by Google Gemini 3 Flash Preview and the Koog AI agent framework.**
Built with Kotlin and Ktor, it handles real conversations, sends rich media, reacts to messages, and presents interactive button menus — all through the Kapso WhatsApp Cloud API.
</div>
---
## 🚀 Features
- **🧠 Intelligent Reasoning** — Powered by Gemini 3 Flash Preview for context-aware responses and tool execution.
- **📱 Rich Media Support** — Understands and sends text, images, videos, audio clips, documents, locations, reactions, and stickers.
- **💾 Per-User Memory** — Maintains conversation history independently for each user (up to 20 messages).
- **🛡️ Secure Webhooks** — HMAC-SHA256 signature verification and automatic message deduplication.
- **⚡ High Performance** — Built on Ktor and Coroutines with optimized thread management.
- **🌍 Multilingual** — Automatically responds in the same language the user writes in.
---
## 🛠️ What It Does
When a WhatsApp user sends a message to your number, the bot:
1. **Receives** it via a secure webhook (HMAC-SHA256 verified).
2. **Analyzes** it with a per-user AI agent that has full conversation memory.
3. **Reasons** with Gemini 3 Flash Preview and selects the appropriate response action.
4. **Responds** with a text reply, image, document, emoji reaction, or interactive button menu.
---
## 🏗️ Architecture
```mermaid
graph TD
User([WhatsApp User]) --> Kapso[Kapso API]
KapsoGoogle's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.
Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.
Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.
Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.
Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.
Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.