A simple MCP server for the SO-ARM100 control
# SO-ARM100 Robot Control with MCP [](https://youtu.be/EmpQQd7jRqs) A companion repository to my video about MCP server for the robot: - **MCP Server** for LLM-based AI agents (Claude Desktop, Gemini, Windsurf, etc.) to control the robot - **Direct keyboard control** for manual operation - **CLI AI Agent** can use it directly to control the robot with Gemini, Gemini or GPT model If you want to know more about MCP refer to the [official MCP documentation](https://github.com/modelcontextprotocol/python-sdk) This repository suppose to work with the SO-ARM100 / 101 robots. Refer to [lerobot SO-101 setup guide](https://huggingface.co/docs/lerobot/so101) for the detailed instructions on how to setup the robot. Update! Now it partially supports [LeKiwi](https://github.com/SIGRobotics-UIUC/LeKiwi) (only arm, the mobile base control through MCP is TBD). I also added a simple agent that uses MCP server to control the robot. It supports Gemini, Gemini and GPT models. In my experience Gemini is the best and GPT is not so good, Gemini is in between. After I released the video and this repository, LeRobot released a significant update of the library that breaks the compatibility with the original code. If you want to use the original code and exactly follow the video, please use [this release](https://github.com/IliaLarchenko/robot_MCP/tree/v0.0.1). ## Quick Start ### 1. Install Dependencies For simplicity I use simple pip instead of uv that is often recommended in MCP tutorials - it works just fine. ```bash python -m venv .venv source .venv/bin/activate # or .venv\Scripts\activate on Windows pip install -r requirements.txt ``` It may be required to install lerobot separately, just use the official instructions from the [lerobot repository](https://github.com/huggingface/lerobot) ### 2. Connect Your Robot - Connect SO-ARM100 via USB - Update `config.py` with your serial port for so-arm (e.g., `
Google's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.
Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.
Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.
Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.
Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.
Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.