PARL (Parallel-Agent Reinforcement Learning) is a training paradigm that teaches models to decompose complex tasks into parallel subtasks and coordinate multiple agents simultaneously.
# PARL: Parallel-Agent Reinforcement Learning [](LICENSE) [](https://www.python.org/downloads/) [](https://pytorch.org/) > **⚠️ Disclaimer**: This is an **open-source community implementation** of the PARL (Parallel-Agent Reinforcement Learning) technique based on the Kimi K2.5 technical report. This is **NOT an official implementation** from Kimi AI or any affiliated organization. This project is maintained independently by The Swarm Corporation and the open-source community. Open-source implementation of **PARL (Parallel-Agent Reinforcement Learning)**, a novel training paradigm that enables AI models to decompose complex tasks into parallel subtasks and coordinate multiple agents simultaneously. ## Overview PARL is a training methodology that addresses the critical challenge of **serial collapse** in multi-agent systems, where models default to sequential execution despite having parallel computational capacity. By implementing staged reward shaping and a latency-oriented evaluation metric, PARL trains models to efficiently orchestrate up to 100 sub-agents across 1,500+ coordinated steps. ### Key Features - **Staged Reward Shaping**: Dynamic reward annealing that encourages parallelism early in training and gradually shifts focus toward task success - **Instantiation Reward**: Incentivizes subagent creation and concurrent execution - **Critical Steps Metric**: Latency-oriented evaluation inspired by parallel computation's critical path concept - **Differentiable Components**: Fully compatible with gradient-based optimization - **Orchestrator-Subagent Architecture**: Trainable coordinator with frozen execution agents ## Architecture ``` ┌─────────────────────────────────────────────┐ │ Orchestrator Agent │ │ (Trainable Central Coordin
HAL 分层混合模型工作流 — 强模型(Claude)负责理解/拆解/验收,低成本模型(DeepSeek)负责检索/提取/清洗。Hermes Agent skill。
An LLM agent fine-tuned on DeepSeek for spaced repetition, dynamically integrating knowledge points based on the Ebbinghaus forgetting curve.
基于 STM32F103 构建的端到端 AI 智能手表生态。自研“零重定位”原生机器码动态加载引擎与页面栈式 UI 框架;集成生产级 OTA 回滚保护机制与高带宽(921600 baud)串口协议栈。通过 Node.js 中继实现 DeepSeek AI 语义控制及 ASRPRO 语音全双工交互,是一个集成了分布式计算、现代存储管理与 AI Agent 的嵌入式全栈工程。
A Meta-Agent-Driven Self-Evolving Multi-Agent System for UAV Detection and Tracking
One command to run Hermes AI Agent with a browser UI. Zero prerequisites. 一行命令,AI 就位。
网页应用Agent,接入DeepSeek、Mimo等模型