step_game — DeepSeek Agents | Neura Market
    Neura MarketNeura Market/DeepSeek
    ChatGPTChatGPTClaudeClaudeGeminiGeminiCursorCursorGrokGrokPerplexityPerplexityDeepSeekDeepSeek
    CoPilotCoPilotStable DiffusionStable DiffusionMidjourneyMidjourney
    View All Directories
    OverviewRulesPromptsMCPsAgentsBlogVideosGuidesCoursesCommunityTrendingGenerate
    DeepSeekAgentsstep_game
    Back to Agents
    step_game

    step_game

    lechmazur January 21, 2025
    84 copies 0 downloads

    Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a move (1, 3, or 5 steps). Whenever two or more players choose the same number, all colliding players fail to advance.

    Agent Definition
    # Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure
    
    A three-player “step-race” that challenges LLMs to engage in **public conversation** before picking a move (**1**, **3**, or **5** steps). Whenever two or more players choose the same number, all colliding players fail to advance. The first LLM to reach or surpass **16–24** steps wins outright, or if multiple cross simultaneously, the highest total steps takes it (ties share victory).
    
    This design moves beyond static Q&A. Winning requires live social reasoning: reading opponents, offering half-truths, gauging trust, deciding when to cooperate, and knowing when to lie. Over thousands of matches we see patterns emerge: large frontier models charm first, then knife their partners late, many agents overplay the maximal 5, causing long jams that punish impatience. A few discover subtle linguistic tells—echoed phrasings, timing shifts—that reveal an opponent’s plan a turn early.
    
    The dataset opens fresh questions. Can we predict a model’s next move from its last sentence? Which phrases cloak a bluff? Do temporary alliances ever stick? How fast does an agent abandon a losing script? 
    
    ---
    
    ## Animation
    
    https://github.com/user-attachments/assets/f07abbd8-a780-440a-8fae-66f7154cf010
    
    Longer video:
    
    [![Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure: frame-by-frame replay of each game](https://markdown-videos-api.jorgenkh.no/url?url=https%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DAnPKfrIPAgQ)](https://www.youtube.com/watch?v=AnPKfrIPAgQ)
    
    
    We generate a **frame-by-frame** and a **summary** replay of each game, illustrating:
    1. Conversation sub-rounds with highlighted quotes
    2. Secret moves (1,3,5) and collisions
    3. Real-time positions on the track
    4. A dynamic scoreboard (TrueSkill ratings, partial-win tallies)
    
    The animation reveals how LLMs strategize, stall, sabotage, or cooperate, culminating in final rankings. It shows how their talk transla

    Tags

    benchmarkdeepseekdeepseek-r1evalevaluationgamegpt-4-5gpt-4ollmmulti-agent

    Comments

    More Agents

    View all
    hybrid-model-workflow

    hybrid-model-workflow

    HAL 分层混合模型工作流 — 强模型(Claude)负责理解/拆解/验收,低成本模型(DeepSeek)负责检索/提取/清洗。Hermes Agent skill。

    P
    ph4ble
    1
    Dynamic-Review-Agent

    Dynamic-Review-Agent

    An LLM agent fine-tuned on DeepSeek for spaced repetition, dynamically integrating knowledge points based on the Ebbinghaus forgetting curve.

    1
    1838177
    1
    StellarOS-Watch

    StellarOS-Watch

    基于 STM32F103 构建的端到端 AI 智能手表生态。自研“零重定位”原生机器码动态加载引擎与页面栈式 UI 框架;集成生产级 OTA 回滚保护机制与高带宽(921600 baud)串口协议栈。通过 Node.js 中继实现 DeepSeek AI 语义控制及 ASRPRO 语音全双工交互,是一个集成了分布式计算、现代存储管理与 AI Agent 的嵌入式全栈工程。

    C
    chenshuang888
    1
    UAVagent1.0deepseek

    UAVagent1.0

    A Meta-Agent-Driven Self-Evolving Multi-Agent System for UAV Detection and Tracking

    S
    StarlitPupils
    2
    hermes-goai-agent

    hermes-go

    One command to run Hermes AI Agent with a browser UI. Zero prerequisites. 一行命令,AI 就位。

    L
    LAI-755
    1
    Agent

    Agent

    网页应用Agent,接入DeepSeek、Mimo等模型

    C
    Cosmos-815
    1

    Stay up to date

    Get the latest DeepSeek prompts, rules, and resources delivered to your inbox weekly.

    Neura Market LogoNeura Market

    Discover the best AI prompts, plugins, and resources for DeepSeek and more.

    Content Types

    • Rules
    • Prompts
    • MCPs
    • Agents
    • Guides

    Platforms

    • ChatGPT Directory
    • Claude Directory
    • Gemini Directory
    • Cursor Directory
    • Grok Directory
    • Perplexity Directory
    • DeepSeek Directory
    • CoPilot Directory
    • Stable Diffusion Directory
    • Midjourney Directory
    • All Directories

    Resources

    • Blog
    • Documentation
    • Help Center
    • Marketplace

    Legal

    • Privacy Policy
    • Terms of Service

    © 2026 Neura Market. All rights reserved.

    |

    Not affiliated with any AI platform vendors.