Claude Best Practices

Claude Sonnet 4 for RAG: Embedding Fine-Tuning and Retrieval Best Practices

Claude Directory January 15, 2026

0 views

Discover how Claude 3.5 Sonnet revolutionizes RAG with its powerful embeddings. This guide delivers Python and TypeScript examples for fine-tuned retrieval and generation pipelines.

# Why Claude 3.5 Sonnet Excels in RAG Pipelines Retrieval-Augmented Generation (RAG) combines the strengths of retrieval systems and generative AI to produce accurate, context-rich responses. Claude 3.5 Sonnet, Anthropic's latest flagship model, pairs seamlessly with the text-embedding-3 family for state-of-the-art embeddings. With dimensions up to 3072 and superior performance on benchmarks like MTEB, it's ideal for precise retrieval in knowledge-intensive tasks. This tutorial walks you through building an efficient RAG pipeline: embedding documents, indexing, retrieving relevant chunks, and generating responses with Sonnet. We'll cover optimization techniques—our take on "fine-tuning" the pipeline for peak performance without model fine-tuning, as Anthropic focuses on prompt-based excellence. ## Prerequisites - [Anthropic API key](https://console.anthropic.com/) - Python 3.10+ or Node.js 18+ - Familiarity with vector databases Install dependencies: **Python:** ```bash pip install anthropic chromadb numpy sentence-transformers ``` **TypeScript:** ```bash npm install @anthropic-ai/sdk chromadb ``` We'll use ChromaDB for simplicity—a lightweight, open-source vector store. ## Step 1: Generating Embeddings with Claude Claude's `text-embedding-3-small` (1536 dims, cost-effective) or `text-embedding-3-large` (3072 dims, higher accuracy) powers embeddings. Use the Anthropic SDK. **Python Example:** ```python import anthropic import chromadb client = anthropic.Anthropic(api_key="your-api-key") def get_embedding(text: str) -> list[float]: response = client.embeddings.create( model="text-embedding-3-small", input=text ) return response.embedding # Sample documents documents = [ "Claude 3.5 Sonnet is Anthropic's most capable model for coding and reasoning.", "RAG improves LLM accuracy by retrieving external knowledge.", "Embeddings capture semantic similarity for vector search." ] embeddings = [get_embedding(doc) for doc in documents] ``` **TypeScript Example:** ```typescript import Anthropic from '@anthropic-ai/sdk'; const client = new Anthropic({ apiKey: 'your-api-key' }); async function getEmbedding(text: string): Promise<number[]> { const response = await client.embeddings.create({ model: 'text-embedding-3-small', input: text, }); return response.embedding; } const documents = [ 'Claude 3.5 Sonnet is Anthropic\'s most capable model for coding and reasoning.', 'RAG improves LLM accuracy by retrieving external knowledge.', 'Embeddings capture semantic similarity for vector search.' ]; const embeddings = await Promise.all(documents.map(getEmbedding)); ``` Pro Tip: Batch embeddings (up to 8192 tokens/input) for efficiency—Claude supports multiple inputs per call. ## Step 2: Indexing Documents in ChromaDB Store embeddings with metadata for hybrid search. **Python:** ```python from chromadb.config import Settings chroma_client = chromadb.PersistentClient(path="./rag_index") collection = chroma_client.get_or_create_collection(name="claude_rag") collection.add( embeddings=embeddings, documents=documents, ids=[f"doc_{i}" for i in range(len(documents))] ) ``` **TypeScript (using chromadb-js):** Note: ChromaDB JS client is experimental; for production, consider LanceDB or Pinecone. ```typescript import { ChromaClient, Collection } from 'chromadb'; const client = new ChromaClient({ path: './rag_index' }); let collection: Collection; try { collection = await client.getCollection({ name: 'claude_rag' }); } catch { collection = await client.createCollection({ name: 'claude_rag' }); } await collection.add({ embeddings, documents, ids: documents.map((_, i) => `doc_${i}`), }); ``` ## Step 3: Retrieval Best Practices Optimize retrieval with top-k, thresholds, and chunking. - **Chunking Strategy:** Split docs into 512-token chunks with 20% overlap. Use Claude's tokenizer for accuracy. - **Query Embedding:** Embed the user query similarly. - **Similarity Search:** Cosine similarity (default in Chroma). - **Hybrid:** Combine keyword (BM25) + semantic for noisy data. **Retrieval Function (Python):** ```python def retrieve(query: str, top_k: int = 5) -> list[str]: query_emb = get_embedding(query) results = collection.query( query_embeddings=[query_emb], n_results=top_k ) return results['documents'][0] query = "What is Claude Sonnet good for?" context = retrieve(query) ``` ## Step 4: RAG Generation with Claude 3.5 Sonnet Craft prompts for faithful retrieval use. Sonnet shines with structured XML prompts. **Prompt Template:** ``` <user> <query>{query}</query> <context>{context}</context> Provide a concise, accurate answer based only on the context. </user> ``` **Python Full Pipeline:** ```python def rag_generate(query: str) -> str: context = retrieve(query) context_str = "\ ".join(context) prompt = f"""<user> <query>{query}</query> <context>{context_str}</context> Answer using only the provided context. If unsure, say so. </user>""" response = client.messages.create( model="claude-3-5-sonnet-20240620", max_tokens=500, messages=[{"role": "user", "content": prompt}] ) return response.content[0].text print(rag_generate("What is Claude Sonnet good for?")) ``` **TypeScript:** ```typescript async function ragGenerate(query: string): Promise<string> { const queryEmb = await getEmbedding(query); const results = await collection.query({ queryEmbeddings: [queryEmb], nResults: 5, }); const context = results.documents[0].join('\ '); const prompt = `<user> <query>${query}</query> <context>${context}</context> Answer using only the provided context. </user>`; const response = await client.messages.create({ model: 'claude-3-5-sonnet-20240620', max_tokens: 500, messages: [{ role: 'user', content: prompt }], }); return response.content[0].text; } ``` ## Fine-Tuning Your RAG Pipeline (Optimization Techniques) No direct embedding fine-tuning, but tune these for 20-50% accuracy gains: - **Dynamic Chunk Size:** Use Claude to summarize/score chunks. ```python # Score relevance with Sonnet score_prompt = f"Score relevance of this chunk to query '{query}': {chunk}" ``` - **Reranking:** Retrieve 20, rerank top 10 with cross-encoder (e.g., via HuggingFace). - **Metadata Filtering:** Index doc types/sources. - **Multi-Query:** Generate query variants with Sonnet for better recall. - **Evaluation:** Use RAGAS or custom metrics (faithfulness, answer relevance). **Advanced: Parent-Child Retrieval** Chunk hierarchically—retrieve doc-level, then chunk-level. ## Scaling with Production Tools - **Vector DBs:** Pinecone, Weaviate for millions of vectors. - **Orchestration:** LangChain/LlamaIndex with Claude integrations. - **Caching:** Redis for frequent queries. - **Monitoring:** Track latency, hallucination rates. Example with Pinecone (Python): ```python import pinecone pinecone.init(api_key="your-key", environment="us-west1-gcp") index = pinecone.Index("claude-rag") index.upsert(vectors=[(id, emb, meta) for id, emb, meta in zip(ids, embeddings, metas)]) ``` ## Common Pitfalls and Fixes | Issue | Fix | |-------|-----| | Poor recall | Increase top-k, hybrid search | | Hallucinations | Strict prompt: "Use only context" + temperature=0 | | High latency | Smaller embeddings, async batching | | Cost | Use text-embedding-3-small for indexing, large for queries | ## Conclusion Claude 3.5 Sonnet + embeddings delivers production-grade RAG with minimal setup. Experiment with chunk sizes and prompts—Sonnet's reasoning handles edge cases effortlessly. Fork our [GitHub repo](https://github.com/example/claude-rag) for full code. Word count: ~1450. Dive deeper into Claude API docs for MCP extensions or agents.

Comments

More Blog

View all

Claude for Developers

Building Voice Agents with Claude API and ElevenLabs: Conversational AI Guide

Build natural voice agents combining Claude API's superior reasoning with ElevenLabs' lifelike TTS. This end-to-end guide creates a conversational web app with STT, AI chat, and speech synthesis.

Claude Directory

Model Comparisons

Claude vs Mistral Large 2: 2025 Data Analysis Benchmarks and Use Cases

As data volumes explode in 2025, choosing between Claude's reasoning depth and Mistral Large 2's efficiency is critical. We benchmark SQL generation, visualizations, and large datasets to reveal the w

Claude Directory

Enterprise

Claude Enterprise for Cybersecurity: Threat Modeling and Incident Response

In the high-stakes world of cybersecurity, rapid threat modeling and incident response can mean the difference between containment and catastrophe. Discover how Claude Enterprise empowers security tea

Claude Directory

Claude Code

Claude Code in VS Code: Custom Commands for Refactoring Large Codebases

Refactoring sprawling codebases manually? Harness Claude Code's power in VS Code with custom commands to automate AI-driven refactors across TypeScript and Python projects—saving hours of drudgery.

Claude Directory

Claude for Developers

Claude SDK Rust for Blockchain: Smart Contract Auditing Agents

Build blazing-fast smart contract auditing agents in Rust using the Claude SDK. Harness Claude's reasoning to scan Solidity code for vulnerabilities like reentrancy and overflows.

Claude Directory

Claude Best Practices

Advanced Claude Artifacts: Collaborative Editing in Multi-User Sessions

Elevate team productivity with Claude Artifacts in multi-user projects—enable real-time iterative editing for code reviews and docs without leaving the interface.

Claude Directory

Claude Sonnet 4 for RAG: Embedding Fine-Tuning and Retrieval Best Practices

Tags

Comments

More Blog

Building Voice Agents with Claude API and ElevenLabs: Conversational AI Guide

Claude vs Mistral Large 2: 2025 Data Analysis Benchmarks and Use Cases

Claude Enterprise for Cybersecurity: Threat Modeling and Incident Response

Claude Code in VS Code: Custom Commands for Refactoring Large Codebases

Claude SDK Rust for Blockchain: Smart Contract Auditing Agents

Advanced Claude Artifacts: Collaborative Editing in Multi-User Sessions