Autonomous AI Desktop Agent for Linux – Multi-LLM, Desktop Control via VNC, WhatsApp Integration, RAG Knowledge Base, OpenClaw Skill Ecosystem
<div align="center"> # 🤖 Jarvis AI Desktop Agent **An autonomous AI agent with web frontend, desktop control, and multi-LLM support** [](https://www.python.org/) [](LICENSE) [](https://github.com/dev-core-busy/jarvis/releases) [](https://www.linux.org/) [](https://github.com/dev-core-busy/jarvis/pulls) [](https://github.com/dev-core-busy/jarvis#openclaw-skill-ecosystem) *Control your Linux desktop with natural language. Receive tasks via WhatsApp. Search your knowledge base. Automate everything.* [**Live Demo**](https://jarvis-ai.info) · [**Report Bug**](https://github.com/dev-core-busy/jarvis/issues) · [**Request Feature**](https://github.com/dev-core-busy/jarvis/issues) · [**Contribute**](#contributing) ---  </div> --- ## 📋 Table of Contents - [Overview](#overview) - [Key Features](#key-features) - [Architecture](#architecture) - [Tech Stack](#tech-stack) - [Screenshots](#screenshots) - [Installation](#installation) - [Configuration](#configuration) - [Skill System](#skill-system) - [WhatsApp Integration](#whatsapp-integration) - [Knowledge Base](#knowledge-base) - [API Reference](#api-reference) - [Contributing](#contributing) - [Third-Party Licenses](#third-party-licenses) - [License](#license) --- ## Overview Jarvis is a **self-hosted, autonomous AI desktop agent** that runs on a Linux server. It combines a polished web frontend with real desktop control — you can watch and direct the agent as it works, right in your browser. The core idea: give Jarvi
Google's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.
Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.
Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.
Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.
Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.
Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.