Web interface for the Gemini 2.5 Computer Use model
# Browsafex A web application wrapper for the [Gemini 2.5 Computer Use model](https://blog.google/technology/google-deepmind/gemini-computer-use-model/). It connects to a web browser and implements all the functionality required by the model to interact with websites. Check out this demo video to see Browsafex in action: <div align="center"> [](https://youtu.be/2qL5L4xzgWo) </div> ## Prerequisites - Node.js (version 18 or higher) - Yarn package manager - Claude API key - Either a local Google Chrome browser OR a Kernel API key ## Configuration To use the web app, you need to provide minimal configuration by adding your Claude API key and browser configuration to the `.env` file. ### Required Configuration ``` GEMINI_API_KEY=your-api-key ``` ### Browser Configuration (Choose One Option) #### Option 1: Local Chrome Browser (Default) The app can connect to an existing Chrome browser instance with remote debugging enabled using the `--remote-debugging-port` flag. You can start one on your local machine with the following commands, depending on your operating system: **macOS:** ``` /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 ``` **Windows:** ``` C:\Program Files\Google\Chrome\Application\chrome.exe --remote-debugging-port=9222 ``` **Linux:** ``` google-chrome --remote-debugging-port=9222 ``` Then set the `BROWSER_URL` environment variable to the URL of the Chrome instance: ``` BROWSER_URL=http://localhost:9222 ``` #### Option 2: Kernel Browser Service (Recommended for Production) [Kernel](https://www.onkernel.com/) provides browsers-as-a-service in the cloud, eliminating the need to manage local browser instances. They offer a nice free tier that you can use to experiment with the app. 1. Sign up for a Kernel account at [https://www.onkernel.com/](https://www.onkernel.com/) 2. Get your API key from the Kernel dashboard 3. Add it to your
Google's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.
Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.
Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.
Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.
Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.
Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.