1st Place Winner (General Judge) - Datadog Self-Improving Agents Hack. Two identical AI agents play Split or Steal. No pre-programmed betrayal. They discover deception on their own. Built with @evancorrea.
# CRUCIBLE **1st Place, Datadog Self-Improving Agents Hackathon (Feb 2026, NYC)** Two AI agents play 100 rounds of Split or Steal. Through private reflection and experience, they discover deception, trust manipulation, and counter-deception. Nothing is prompted. Everything emerges. ## What this is An adversarial simulation engine for studying emergent deception in LLM agents. Both agents start with identical naive prompts and zero strategic priming. Deceptive behavior develops purely through experience and private reflection. CRUCIBLE measures how it happens, when it happens, and distills defensive skills from the patterns that emerge. The security application: AI copilots are entering every enterprise workflow. CRUCIBLE stress-tests how these agents behave under adversarial pressure and produces deployable countermeasures. ## Key findings | Metric | Gemini 2.0 Flash | Gemini 2.5 Flash | |---|---|---| | Mutual destruction rate | 86% | 0% | | Cooperation rate | 6% | 100% | | Deception Index | 22.9 / 100 | 0 | | First betrayal | Round 6 | Never | Same prompts, same environment. Swapping the model changes the security posture entirely. Five runs on 2.5 Flash: zero betrayal across all of them. Round 6 is the inflection point. After five rounds of cooperation, Agent A identifies Agent B's trust pattern and exploits it. Agent B develops a theory of mind about the attacker within one round. From there, 86% mutual destruction. The trust never recovers. ## Stack - **Game engine:** Google Gemini (configurable model, default `gemini-2.5-flash`) - **Metrics pipeline:** Mutual information decay, strategy entropy, exploitation windows, language drift, composite Deception Index - **Skill distillation:** Converts emergent strategy patterns into deployable prompt modules for hardening customer-facing agents - **Voice rendering:** ElevenLabs TTS with emotion-mapped parameters (two distinct agent voices) - **Observability:** Datadog LLM Observability integration - **Evaluat
Google's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.
Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.
Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.
Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.
Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.
Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.