Loading...
Loading...
Loading...
The architecture you described is the standard setup for embedding-based retrieval in RAG systems:
## Embedding-Based Retrieval Architecture
The architecture you described is the standard setup for embedding-based retrieval in RAG systems:
1. **Embedding Model**: This neural network converts text into dense vector representations. Popular models include:
- Sentence-BERT/Sentence Transformers
- OpenAI's text-embedding models
- Models like E5, BGE, or GTE
2. **Vector Database**: Stores these embeddings and provides efficient search capabilities. Examples include:
- Pinecone
- Weaviate
- Milvus
- Chroma
- FAISS (Facebook AI Similarity Search)
3. **Indexing Component**: This is critical but often overlooked. It:
- Organizes vectors for efficient retrieval
- Creates data structures that allow for fast approximate nearest neighbor search
- May use techniques like hierarchical navigable small worlds (HNSW) or inverted file index (IVF)
## Cosine Similarity Explained
Cosine similarity is the most common way to measure how similar two vectors are in embedding space:
1. **Mathematical Definition**: It measures the cosine of the angle between two vectors
- Formula: cos(θ) = (A·B)/(||A||·||B||)
- Returns values between -1 and 1 (though with typical embeddings, values range from 0 to 1)
2. **Intuitive Explanation**:
- Think of two vectors as arrows pointing in different directions from the origin
- If they point in the exact same direction (perfectly similar): cosine = 1
- If they point in perpendicular directions (unrelated): cosine = 0
- If they point in opposite directions (opposite meaning): cosine = -1
3. **Why It's Used for Embeddings**:
- It focuses on the direction of vectors, not their magnitude
- This means a document's length doesn't overly influence similarity scores
- It works well in high-dimensional spaces where Euclidean distance can be problematic
## RAG Evaluation Metrics
The two metrics you mentioned are key for evaluating retrieval quality:
1. **Context Precision**: The proportion of retrieved chunks that are actually relevant
- High precision means most retrieved information is useful
- Low precision means the system is retrieving a lot of irrelevant information
2. **Context Recall**: The proportion of all relevant chunks that were successfully retrieved
- High recall means the system found most of the important information
- Low recall means the system missed key information
These metrics help balance the tradeoff between retrieving too much information (high recall, low precision) versus missing important information (high precision, low recall).
## Chunking Strategies - The Deep Dive
Chunking is perhaps the most underappreciated aspect of RAG systems:
1. **Size Considerations**:
- Too large: Wastes context window, may dilute relevant information
- Too small: Loses important context, creates fragmented knowledge
- Optimal size often depends on your specific content and use case
2. **Advanced Chunking Strategies**:
- **Semantic Chunking**: Breaking at natural semantic boundaries rather than arbitrary character counts
- **Overlapping Chunks**: Having some overlap between chunks to avoid breaking concepts
- **Hierarchical Chunking**: Creating both small detailed chunks and larger overview chunks
- **Recursive Chunking**: Using the model itself to identify logical break points
3. **Metadata Augmentation**:
- Adding title, source, date, author, category, etc.
- Including parent-child relationships between chunks
- Preserving document structure information (section, chapter, etc.)
- This metadata becomes crucial for contextualizing retrieved information
## Retrieval Optimization Techniques
1. **Query Rewriting**:
- Using the model to expand or reformulate the original query
- Creating multiple query variations to improve retrieval coverage
- Example: User asks "How tall is the tower?" → System rewrites to "What is the height of the Eiffel Tower in Paris, France?"
2. **Reranking**:
- Two-stage retrieval process:
1. Initial broad retrieval using fast methods
2. More sophisticated reranking of the top results
- Can use cross-attention models (like cross-encoders) that look at query-document pairs together
- Often dramatically improves precision at a reasonable computational cost
These optimization techniques are becoming standard in advanced RAG systems because they significantly improve retrieval quality without requiring complete architecture changes.
This roadmap outlines planned enhancements to transform cheap-RAG from a functional document retrieval system into a production-ready, state-of-the-art RAG framework. Priorities are based on impact vs. effort analysis and alignment with mainstream RAG best practices.
See `specs/Semblance-MVP-Plan-v2.md` for full technical specification.
All notable changes to AvocadoDB will be documented in this file.
**Goal:** Stand up Toasty as a reliable service wired to BLT/GitHub events; deliver safe, useful summaries early.