gemini-computer-use — Gemini Agents | Neura Market
    Neura MarketNeura Market/Gemini
    ChatGPTChatGPTClaudeClaudeGeminiGeminiCursorCursorGrokGrokPerplexityPerplexityDeepSeekDeepSeek
    CoPilotCoPilotStable DiffusionStable DiffusionMidjourneyMidjourney
    View All Directories
    OverviewRulesPromptsMCPsAgentsBlogVideosGuidesCoursesCommunityGemsExtensionsTrendingGenerate
    GeminiAgentsgemini-computer-use
    Back to Agents
    gemini-computer-use

    gemini-computer-use

    pmbstyle October 7, 2025
    23 copies 0 downloads

    A minimal browser automation agent using Google's Gemini 2.5 Computer Use Preview model and Playwright for web browser control.

    Agent Definition
    # Gemini Computer Use Agent
    
    A minimal browser automation agent using Google's Gemini 2.5 Computer Use Preview model and Playwright for web browser control.
    
    For browser automation without a sandbox, use this project [https://github.com/pmbstyle/gemini-browser-agent ](https://github.com/pmbstyle/gemini-browser-agent)
    
    [<img alt="image" src="https://github.com/user-attachments/assets/297ff11a-9784-49e2-8b25-d48175fe89d2" />](https://www.youtube.com/watch?v=zRjGeNP4tPs)
    
    
    ## Features
    
    - **Visual Browser Control**: Uses screenshots to "see" and interact with web pages
    - **Automated Actions**: Supports mouse clicks, keyboard input, scrolling, navigation, and more
    - **Safety Controls**: Built-in confirmation prompts for risky actions
    - **Human-in-the-Loop**: Optional user confirmation for sensitive operations
    
    ## Supported Actions
    
    - `open_web_browser`, `navigate`, `search`
    - `click_at`, `hover_at`, `type_text_at`
    - `key_combination`, `scroll_document`, `scroll_at`
    - `drag_and_drop`, `go_back`, `go_forward`
    - `wait_5_seconds`
    
    ## Setup
    
    ### 1. Create and activate environment
    ```bash
    conda create -n gemcu python=3.11 -y
    conda activate gemcu
    ```
    
    ### 2. Install packages
    ```bash
    python -m pip install --upgrade pip
    python -m pip install google-genai playwright termcolor
    ```
    
    ### 3. Install Playwright browser
    ```bash
    playwright install chromium
    ```
    
    ### 4. Set API key
    ```bash
    # Windows PowerShell
    $env:GEMINI_API_KEY="PASTE_YOUR_KEY_HERE"
    
    # Linux/Mac
    export GEMINI_API_KEY="PASTE_YOUR_KEY_HERE"
    ```
    
    ## Usage
    
    ```bash
    python agent.py "Find Wikipedia article about Niagara Falls and open History section"
    ```
    
    ## Requirements
    
    - Python 3.11+
    - Claude API key ([Get API key](https://aistudio.google.com/api-keys))
    - Chrome/Chromium browser
    
    ## Safety
    
    This agent runs in a controlled browser environment. For production use, consider running in a sandboxed virtual machine or container for additional security.
    
    Based on [Google's Gemini Computer Use API](https://ai.google.dev/gemini-api

    Tags

    aiai-agentbrowser-usecomputer-use-agentgemini-api

    Comments

    More Agents

    View all
    research

    NotebookLM

    Google's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.

    G
    Google
    browser

    Project Mariner (Browser Agent)

    Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.

    G
    Google DeepMind
    multimodal

    Project Astra (Multimodal Agent)

    Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.

    G
    Google DeepMind
    enterprise

    Gemini Enterprise Agent Platform

    Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.

    G
    Google Cloud
    research

    Gemini Deep Research Agent

    Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.

    G
    Google
    canvas

    Gemini Canvas Agent

    Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.

    G
    Google

    Stay up to date

    Get the latest Gemini prompts, rules, and resources delivered to your inbox weekly.

    Neura Market LogoNeura Market

    Discover the best AI prompts, plugins, and resources for Gemini and more.

    Content Types

    • Rules
    • Prompts
    • MCPs
    • Agents
    • Guides

    Platforms

    • ChatGPT Directory
    • Claude Directory
    • Gemini Directory
    • Cursor Directory
    • Grok Directory
    • Perplexity Directory
    • DeepSeek Directory
    • CoPilot Directory
    • Stable Diffusion Directory
    • Midjourney Directory
    • All Directories

    Resources

    • Blog
    • Documentation
    • Help Center
    • Marketplace

    Legal

    • Privacy Policy
    • Terms of Service

    © 2026 Neura Market. All rights reserved.

    |

    Not affiliated with any AI platform vendors.