gemini-computer-use

Name: gemini-computer-use
Author: pmbstyle

pmbstyle October 7, 2025

23 copies 0 downloads

A minimal browser automation agent using Google's Gemini 2.5 Computer Use Preview model and Playwright for web browser control.

Gemini Computer Use Agent

A minimal browser automation agent using Google's Gemini 2.5 Computer Use Preview model and Playwright for web browser control.

For browser automation without a sandbox, use this project https://github.com/pmbstyle/gemini-browser-agent

Features

Visual Browser Control: Uses screenshots to "see" and interact with web pages
Automated Actions: Supports mouse clicks, keyboard input, scrolling, navigation, and more
Safety Controls: Built-in confirmation prompts for risky actions
Human-in-the-Loop: Optional user confirmation for sensitive operations

Supported Actions

open_web_browser, navigate, search
click_at, hover_at, type_text_at
key_combination, scroll_document, scroll_at
drag_and_drop, go_back, go_forward
wait_5_seconds

Setup

1. Create and activate environment

conda create -n gemcu python=3.11 -y
conda activate gemcu

2. Install packages

python -m pip install --upgrade pip
python -m pip install google-genai playwright termcolor

3. Install Playwright browser

playwright install chromium

4. Set API key

# Windows PowerShell
$env:GEMINI_API_KEY="PASTE_YOUR_KEY_HERE"

# Linux/Mac
export GEMINI_API_KEY="PASTE_YOUR_KEY_HERE"

Usage

python agent.py "Find Wikipedia article about Niagara Falls and open History section"

Requirements

Python 3.11+
Claude API key (Get API key)
Chrome/Chromium browser

Safety

This agent runs in a controlled browser environment. For production use, consider running in a sandboxed virtual machine or container for additional security.

Based on [Google's Gemini Computer Use API](https://ai.google.dev/gemini-api

Comments

More Agents

View all

agentic-ai

Agentsmith

Universal, model-agnostic operating harness for AI agents (Claude, Codex, Gemini, …) — a lean core + work-type profiles assembled by one setup script.

PromptPartner

308

agent-skills

Awesome Gamedev Agent Skills

Game-development Agent Skills for AI coding agents: install once and a master router loads the right skill for your engine and task. 66 original, version-pinned skills (plus a master router) in the portable SKILL.md format that runs across Claude Code, Cursor, Codex, Copilot, Gemini CLI and more, for Godot, Unity, Unreal, web and beyond.

gamedev-skills

303

ai-agents

Agentpet

A desktop pet for macOS & Windows that monitors your AI coding agents (Claude Code, Codex, Cursor, Gemini...) in real time, and grows as you code, feed it tokens, level it up, climb the leaderboard.

ntd4996

279

ai-agent

UltraGameStudio

UltraGameStudio - AI coding agent for game development: engine workflows, gameplay code, and asset generation.

wellingfeng

260

Zero

The coding agent that answers to you, your model, your machine, your rules.

Gitlawb

1,099

agent-bridge

Lucarne

Stop babysitting local AI agents. Just notifications, approve, and resume your Codex,Pi,Grok, or Claude code sessions anywhere. 0-Intrusion mobile control bridge via Telegram/微信/飞书. No hooks, no skills, no MCP.

tuchg

314