Agentic DeepSeek-OCR wrapper for turning PDFs and images into Markdown with a customizable vLLM pipeline and Gradio UI.
# DeepReader [](assets/deepreader_logo.png) DeepReader is an agentic reading toolkit that couples DeepSeek-OCR with opinionated defaults for running single-document or batch OCR. It streamlines image/PDF ingestion, produces Markdown accompanied by figure crops and layout previews, and exposes knobs for both CLI and Gradio workflows. ## Project Layout - `images/`: Sample page images for quick smoke-tests. - `docs/`: Input PDFs for full-length papers. - `outputs/`: Generated Markdown, annotated images, and layout PDFs. - `DeepSeek-OCR-master/DeepSeek-OCR-vllm/`: vLLM-powered runtime (default entry points). ## Environment Setup - download the vllm-0.8.5 [whl](https://github.com/vllm-project/vllm/releases/tag/v0.8.5) ```bash conda create -n deepreader python=3.12.9 -y conda activate deepreader pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 \ --index-url https://download.pytorch.org/whl/cu118 pip install vllm-0.8.5+cu118-cp38-abi3-manylinux1_x86_64.whl pip install -r requirements.txt ``` Optional extras: - `pip install flash-attn==2.7.3 --no-build-isolation` (faster attention if supported). ## Configuration Strategy `DeepSeek-OCR-vllm/config.py` reads all defaults from environment variables, making it easy to swap inputs, outputs, prompts, or GPU settings without editing code. ```bash export DEEPREADER_INPUT_PATH="$PWD/docs/paper.pdf" export DEEPREADER_OUTPUT_PATH="$PWD/outputs/paper_run" export DEEPREADER_PROMPT='<image> <|grounding|>Convert the document to markdown.' export DEEPREADER_PROMPT_TEMPLATE=document export DEEPREADER_MODE=gundam export DEEPREADER_CUDA_VISIBLE_DEVICES=0 export DEEPREADER_GPU_MEM_UTIL=0.8 export DEEPREADER_KEEP_MODELS_LOADED=1 ``` >**GPU tip**: the default vLLM config assumes ≈10 GB of free VRAM. Tune `DEEPREADER_GPU_MEM_UTIL` down if you’re memory-constrained. ## Gradio Interface [](assets/interface.png) Launch an inte
HAL 分层混合模型工作流 — 强模型(Claude)负责理解/拆解/验收,低成本模型(DeepSeek)负责检索/提取/清洗。Hermes Agent skill。
An LLM agent fine-tuned on DeepSeek for spaced repetition, dynamically integrating knowledge points based on the Ebbinghaus forgetting curve.
基于 STM32F103 构建的端到端 AI 智能手表生态。自研“零重定位”原生机器码动态加载引擎与页面栈式 UI 框架;集成生产级 OTA 回滚保护机制与高带宽(921600 baud)串口协议栈。通过 Node.js 中继实现 DeepSeek AI 语义控制及 ASRPRO 语音全双工交互,是一个集成了分布式计算、现代存储管理与 AI Agent 的嵌入式全栈工程。
A Meta-Agent-Driven Self-Evolving Multi-Agent System for UAV Detection and Tracking
One command to run Hermes AI Agent with a browser UI. Zero prerequisites. 一行命令,AI 就位。
网页应用Agent,接入DeepSeek、Mimo等模型