A framework making it effortless to convert any llm model into a reasoning agent like o1 or DeepSeek's r1
# Agent Gym

[](https://discord.gg/swarms) [](https://www.youtube.com/@kyegomez3242) [](https://www.linkedin.com/in/kye-g-38759a207/) [](https://x.com/kyegomezb)
Convert any model into a r1-like reasoning hyper-intelligent agent. Leverages TRL, Huggingface, and various other libraries. This is a work in progress. Our goal is to make it easy to train any model into a reasoning agent.
- Sources:
- [Open R1 Blog](https://huggingface.co/blog/open-r1)
- [GRPO Documentation from trl](https://huggingface.co/docs/trl/main/en/grpo_trainer)
- [Huggingface Docs](https://huggingface.co/docs/transformers/main/en/index)
- [GRPO Docs](https://huggingface.co/docs/trl/main/en/grpo_trainer)
## Installation
```bash
pip3 install -U agentgym
```
## Usage
```python
from agentgym.r1_pipeline import R1Pipeline, SFTConfig
r1_pipeline = R1Pipeline(
sft_model="Qwen/Qwen2-0.5B-Instruct",
tokenizer_name="Qwen/Qwen2-0.5B-Instruct",
sft_dataset="trl-lib/tldr",
sft_args=SFTConfig(output_dir="/tmp"),
only_grpo=True,
model_name="Qwen/Qwen2-0.5B-Instruct"
)
r1_pipeline.run()
```
## Architecture
The architecture is as follows:
- SFT: Supervised Fine-Tuning
- GRPO: Generative Reinforcement Policy Optimization
-> model -> sft -> grpo -> model
```mermaid
graph TD;
A[model] --> B[sft]
B --> C[grpo]
C --> D[reasoning model]
```
# License
MIT
HAL 分层混合模型工作流 — 强模型(Claude)负责理解/拆解/验收,低成本模型(DeepSeek)负责检索/提取/清洗。Hermes Agent skill。
An LLM agent fine-tuned on DeepSeek for spaced repetition, dynamically integrating knowledge points based on the Ebbinghaus forgetting curve.
基于 STM32F103 构建的端到端 AI 智能手表生态。自研“零重定位”原生机器码动态加载引擎与页面栈式 UI 框架;集成生产级 OTA 回滚保护机制与高带宽(921600 baud)串口协议栈。通过 Node.js 中继实现 DeepSeek AI 语义控制及 ASRPRO 语音全双工交互,是一个集成了分布式计算、现代存储管理与 AI Agent 的嵌入式全栈工程。
A Meta-Agent-Driven Self-Evolving Multi-Agent System for UAV Detection and Tracking
One command to run Hermes AI Agent with a browser UI. Zero prerequisites. 一行命令,AI 就位。
网页应用Agent,接入DeepSeek、Mimo等模型