# Why Deploy MCP Servers on Kubernetes?
Hey there, Claude enthusiasts! If you're building AI agents or extending Claude's capabilities with custom tools via MCP (Model Context Protocol) servers, you've probably hit scalability walls. MCP servers let Claude call external tools dynamically—think real-time data fetches, computations, or integrations. But running them on a single VM? Not gonna cut it for production.
Enter Kubernetes (K8s): the gold standard for container orchestration. It handles scaling, self-healing, load balancing, and rolling updates effortlessly. In this guide, we'll containerize a sample MCP server, deploy it to K8s, and make it enterprise-ready. By the end, you'll have a scalable ecosystem that powers Claude's tool-calling superpowers.
Perfect for devs using Claude API, teams in engineering/marketing, or anyone evaluating Claude for enterprise.
## Prerequisites
Before we dive in, ensure you have:
- A running Kubernetes cluster (Minikube for local dev, EKS/GKE/AKS for prod).
- `kubectl` and `helm` installed.
- Docker for building images.
- Basic familiarity with YAML and containers.
- A sample MCP server. We'll use a Python FastAPI example (Claude-specific MCP protocol compliant).
**Quick MCP Primer**: MCP is Anthropic's protocol for tool servers. Claude sends JSON-RPC-like requests to your server's `/mcp` endpoint, you process and respond. Docs: [Anthropic MCP Guide](https://docs.anthropic.com/claude/docs/mcp).
## Step 1: Containerize Your MCP Server
Let's start with a simple MCP server that fetches weather data (a common tool example).
### Sample MCP Server Code
Create `app.py`:
```python
from fastapi import FastAPI, Request
from pydantic import BaseModel
import requests
app = FastAPI()
class MCPRequest(BaseModel):
jsonrpc: str = "2.0"
id: str
method: str
params: dict
@app.post("/mcp")
async def mcp_endpoint(request: Request):
body = await request.json()
if body["method"] == "tools/list":
return {
"jsonrpc": "2.0",
"id": body["id"],
"result": {
"tools": [{
"name": "get_weather",
"description": "Get current weather",
"inputSchema": {"type": "object", "properties": {"city": {"type": "string"}}}
}]
}
}
elif body["method"] == "tools/call":
city = body["params"]["arguments"]["city"]
# Mock API call
weather = "Sunny, 72°F" # Replace with real API
return {
"jsonrpc": "2.0",
"id": body["id"],
"result": {"content": [{"type": "text", "text": f"Weather in {city}: {weather}"}]}
}
return {"jsonrpc": "2.0", "id": body["id"], "error": {"code": -32601, "message": "Method not found"}}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
```
**requirements.txt**:
```
fastapi==0.104.1
uvicorn==0.24.0
pydantic==2.5.0
```
### Dockerfile
```dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]
```
Build and push:
```bash
git clone your-repo # Or create dir
# Add files above
docker build -t yourregistry/mcp-weather:1.0 .
docker push yourregistry/mcp-weather:1.0
```
Pro tip: Use multi-stage builds for slimmer images in prod.
## Step 2: Kubernetes Manifests
Time to orchestrate! We'll create Deployment, Service, and Ingress.
### deployment.yaml
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-weather
spec:
replicas: 3 # Start with 3 pods
selector:
matchLabels:
app: mcp-weather
template:
metadata:
labels:
app: mcp-weather
spec:
containers:
- name: mcp-server
image: yourregistry/mcp-weather:1.0
ports:
- containerPort: 8000
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
livenessProbe:
httpGet:
path: /health # Add /health to app.py
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
```
*(Note: Add `@app.get("/health") def health(): return {"status": "ok"}` to app.py.)*
### service.yaml
```yaml
apiVersion: v1
kind: Service
metadata:
name: mcp-weather-service
spec:
selector:
app: mcp-weather
ports:
- protocol: TCP
port: 80
targetPort: 8000
type: ClusterIP
```
### ingress.yaml (for external access)
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: mcp-weather-ingress
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
cert-manager.io/cluster-issuer: "letsencrypt-prod" # For HTTPS
spec:
ingressClassName: nginx
rules:
- host: mcp-weather.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: mcp-weather-service
port:
number: 80
tls:
- hosts:
- mcp-weather.yourdomain.com
secretName: mcp-weather-tls
```
Apply:
```bash
kubectl apply -f deployment.yaml -f service.yaml -f ingress.yaml
kubectl get pods,svc,ing
```
Your MCP server is now running at `http://mcp-weather.yourdomain.com/mcp`!
## Step 3: Auto-Scaling with HPA
Handle traffic spikes from Claude agents:
```yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: mcp-weather-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mcp-weather
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
```
```bash
kubectl apply -f hpa.yaml
```
K8s will scale pods based on CPU/memory. Monitor with `kubectl top pods`.
## Step 4: Integrating with Claude
Configure Claude to use your MCP server. In prompts or API calls:
```python
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="claude-3-5-sonnet-20240620",
max_tokens=1024,
tools=[{
"type": "mcp",
"mcp_servers": [{"url": "https://mcp-weather.yourdomain.com/mcp"}]
}],
messages=[{"role": "user", "content": "What's the weather in NYC?"}]
)
print(message.content)
```
Claude will auto-discover tools via `/tools/list` and call them. Scale wins here—multiple agents hit the cluster, K8s distributes load.
## Step 5: Monitoring and Best Practices
- **Helm Charts**: Package as Helm for reusability.
```bash
helm create mcp-weather-chart
# Customize templates
helm install mcp-weather ./mcp-weather-chart
```
- **Secrets**: Use K8s Secrets for API keys.
```yaml
apiVersion: v1
kind: Secret
metadata:
name: mcp-secrets
type: Opaque
data:
WEATHER_API_KEY: <base64>
```
Mount in Deployment: `envFrom: secretRef: name: mcp-secrets`.
- **Logging**: Fluentd/Prometheus. Add to app: `logging.basicConfig(level=logging.INFO)`.
- **CI/CD**: GitHub Actions to build/push images, ArgoCD for GitOps deploys.
- **Multi-MCP Ecosystem**: Deploy multiple servers (e.g., weather + stocks) in namespaces.
```yaml
metadata:
namespace: mcp-tools
```
- **Security**: NetworkPolicies, RBAC, mTLS for MCP endpoints.
Common pitfalls: Expose only `/mcp`, validate JSON-RPC strictly, handle timeouts (Claude has 60s defaults).
## Production Checklist
- [ ] Cluster autoscaler enabled.
- [ ] Persistent storage if needed (e.g., Redis for state).
- [ ] CI/CD pipeline.
- [ ] Load testing: `hey -n 10000 -c 100 https://yourdomain/mcp`.
- [ ] Cost optimization: Spot instances.
## Wrapping Up
You've now got a battle-tested, scalable MCP ecosystem on Kubernetes! This setup powers real-world Claude agents in sales (CRM tools), engineering (code analysis), or HR (data lookups). Experiment with Opus for complex reasoning + your tools.
Fork the [GitHub repo](https://github.com/example/mcp-k8s) (imagine it exists), tweak for your use case, and share in comments.
Questions? Drop 'em below. Happy deploying! 🚀
*(Word count: ~1450)*