A natural language-based agent for automating UI testing on mobile devices and web applications using Google's Gemini AI models.
# UI Test Agent A natural language-based agent for automating UI testing on mobile devices and web applications using Google's Agent Development Kit (ADK). This tool allows testers and developers to write test instructions in plain English and have them executed on Android, iOS, or web platforms. > **Note:** This is a prototype/experimental project. While functional, it is not yet ready for production use. Use at your own risk. ## Features - Natural language interface for UI automation and testing - Cross-platform support (Android, iOS, Web) - Integration with Google's Agent Development Kit (ADK) - Support for both Gemini models and local models through LiteLLM - Support for complex, multi-step test scenarios - Real-time feedback and test results ## Requirements - Python 3.8+ - Google Claude API key - For mobile testing: Android device/emulator or iOS device connected to your machine - For web testing: Compatible browser ## Quick Setup 0. Create and activate a virtual environment (recommended): ``` # Create a virtual environment python -m venv venv # Activate the virtual environment # On macOS/Linux: source venv/bin/activate # On Windows: venv\Scripts\activate ``` 1. Install dependencies: ``` pip install -r requirements.txt ``` 2. Configure: ``` cp config.sample.yaml config.yaml ``` Edit `config.yaml` with your Google API key. 3. Run: ``` python main.py --target [android|ios|web] --query "YOUR_TEST_INSTRUCTION" ``` Or pipe content from a file or another command: ``` cat test_prompt.txt | python main.py --target [android|ios|web] echo "Check the login screen" | python main.py --target android ``` ## Configuration Options The `config.yaml` file supports the following options: - `google_api_key`: Your Google API key for Gemini models - `use_vertex_ai`: Boolean to use Google Cloud Vertex AI (default: false) - `model_name`: The model to use (default: "gemini-2.5-pro-preview-03-
Google's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.
Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.
Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.
Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.
Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.
Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.