Full-stack LLM Engineering Lab. Features: Autonomous Agents (ReAct/AutoGPT) | Fine-Tuning Llama/Mistral (SFT/DPO) | Large Model Deployment (DeepSeek 671B / 2.5-bit) | Advanced RAG (Hybrid Search) | Function Calling (Stream/Text-to-SQL/External APIs) | Frameworks (LangChain, Semantic Kernel, OpenAI) | Daily SOTA Paper Tracking. From theory to 0-to-1
# LLMs-Lab Full-stack LLM Engineering Lab. Features: Autonomous Agents (ReAct/AutoGPT) | Fine-Tuning Llama/Mistral (SFT/DPO) | Large Model Deployment (DeepSeek 671B / 2.5-bit) | Advanced RAG (Hybrid Search) | Function Calling (Stream/Text-to-SQL/External APIs) | Frameworks (LangChain, Semantic Kernel, OpenAI) | Daily SOTA Paper Tracking. From theory to 0-to-1. ## 9. 📂 [DeepSeek](DeepSeek/) Focused on **Inference Optimization** and **Low-Bit Quantization** strategies for massive-scale MoE models (600B+ parameters). - **DeepSeek-R1 (671B) 2.51-bit Extreme Quantization Deployment**: - Deployed the **2.51-bit quantized version** (via Unsloth) of the 671B MoE model, achieving an **~80% reduction** in memory footprint (from 720GB to ~212GB). - Analyzed official and community benchmarks for **1.58-bit vs 2.51-bit** configurations, ultimately selecting the 2.51-bit build (Q2_K_XL) to ensure superior reasoning stability on the H20 GPU cluster. - 📄 **[View Hands-on Deployment Log & Benchmarks (PDF)](DeepSeek/Hands-on%20deployment%20of%20671B%20model%20inference%20(2.51-bit%20quantization).pdf)** ## 8. 📂 [Fine-Tuning](Fine-Tuning/) This directory bridges the gap between theoretical architecture analysis and practical, memory-efficient fine-tuning of state-of-the-art open-source models. It covers the full lifecycle from pre-training understanding to post-training alignment. ### Key Modules - **[Transformer Source Code Analysis](Fine-Tuning/AnnotatedTransformer.ipynb)**: A deep dive into the vanilla Transformer architecture, focusing on a line-by-line implementation analysis of **Self-Attention mechanisms**, Multi-Head Attention, and Layer Normalization to understand the foundational building blocks. - **[Llama Series: QLoRA & Quantization](Fine-Tuning/Llama%20%20Fine-Tuning%20&%20Quantization%20&%20Running%20&%20Instroducntion%20&%20Deployment.pdf)**: Implementation of **[QLoRA](Fine-Tuning/Lora%20code.ipynb)** (Quantized Low-Rank Adapters) to fine-tune L
HAL 分层混合模型工作流 — 强模型(Claude)负责理解/拆解/验收,低成本模型(DeepSeek)负责检索/提取/清洗。Hermes Agent skill。
An LLM agent fine-tuned on DeepSeek for spaced repetition, dynamically integrating knowledge points based on the Ebbinghaus forgetting curve.
基于 STM32F103 构建的端到端 AI 智能手表生态。自研“零重定位”原生机器码动态加载引擎与页面栈式 UI 框架;集成生产级 OTA 回滚保护机制与高带宽(921600 baud)串口协议栈。通过 Node.js 中继实现 DeepSeek AI 语义控制及 ASRPRO 语音全双工交互,是一个集成了分布式计算、现代存储管理与 AI Agent 的嵌入式全栈工程。
A Meta-Agent-Driven Self-Evolving Multi-Agent System for UAV Detection and Tracking
One command to run Hermes AI Agent with a browser UI. Zero prerequisites. 一行命令,AI 就位。
网页应用Agent,接入DeepSeek、Mimo等模型