Agents

6 agents available in the Gemini directory

Pre-built AI agents with specialized instructions for specific tasks — from coding and writing to research and analysis. Each agent is ready to deploy with a single click.

agent-benchmark

AgentBench-Live

The open benchmark for AI agent task execution. Claude Code vs Gemini CLI — who wins? Live leaderboard inside.

jackjin1997

synkro

AI agent simulation framework

velocitybolt

agent

SciEvalKit

A unified evaluation toolkit and leaderboard for rigorously assessing the scientific intelligence of large language and vision–language models across the full research workflow.

InternScience

agent-evaluation

teamcity-ai-agent-testing-demo

End-to-end TeamCity framework to run AI agents on SWE-Bench Lite. Spin up isolated Docker images per task, extract patches, score with the official harness, and aggregate success rates. As an example, we'll look at Junie and Google Gemini CLI

JetBrains

promptfoo

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.

promptfoo

20,133

Promptfoo

Test your prompts, agents, and RAGs. Red teaming/pentesting/vulnerability scanning for AI. Compare performance of GPT, Claude, Gemini, DeepSeek, and more. Simple declarative configs with command line and CI/CD integration. Used by OpenAI and Anthropic.

promptfoo

23,400