**Late Chunking** is an advanced method for preparing long documents for retrieval systems, designed to overcome the critical problem of context loss that occurs in traditional document processing.

airageval

vishalmysore

RAG.md

Late chunking is a technique where you embed the entire document first with a long-context embedding model, then chunk the resulting contextualized representations, rather than chunking text first then embedding each chunk independently.

rageval

ever-works

CHUNKING.md

Haskell ASTChunking Enhancement

This document describes the comprehensive enhancement of Haskell code chunking in CocoIndex, inspired by techniques from the ASTChunk library. The improvements transform the original basic regex-based approach into a sophisticated, configurable chunking system with rich metadata and intelligent boundary detection.

aanno

CHUNKING.md

Message Chunking for MCP stdio Transport

When using stdio pipes for MCP communication on macOS, there's a 64KB limit on the amount of data that can be written to a pipe in a single operation. This limitation can cause issues when sending large JSON-RPC messages, such as tool responses with substantial data or resource contents.

aiagentmcp

JamieScanlon

RAG.md

Chunking & Embeddings

This guide walks through the two building blocks of the GPT chat knowledge base:

aiagentopenai

Laisky

METADATA.md

ADR metadata/0003: 64-Byte String Chunking Ownership

- **Status**: Accepted

airag

bloxbean

RAG.md

Text Chunking - Anton (ChromaDB)

title: "Text Chunking Strategies for RAG Applications"

aillmrag

jxnl

RAG.md

Long Context Chunking

The long context chunking system automatically handles documents that exceed embedding model context limits by splitting them into manageable chunks and computing averaged embeddings.

airag

CortexReach

CHUNKING.md

Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers

title: "Making automatic speech recognition work on large files with Wav2Vec2 in 🤗 Transformers"

huggingface

CHUNKING.md

THIS EXAMPLE IS STALE. NEEDS REVAMP!

title: Text chunking example

aieval

HazyResearch

CHUNKING.md

Better File Chunking

Within the IPFS stack/ecosystem, just as within computing as a whole, **an

airageval

ipfs

RETRIEVAL.md

Smart Hybrid Retrieval - Implementation Summary

The Smart Hybrid Retrieval system is a **4-phase intelligent knowledge retrieval algorithm** that combines semantic search, graph expansion, completeness verification, and multi-factor ranking to provide comprehensive and accurate results.

llmrageval

zrg-team

RAG.md

Search and Retrieval

This document describes the search and retrieval capabilities of RAG Modulo, including the 6-stage pipeline architecture, Chain of Thought reasoning, and advanced retrieval techniques.

aillmrag

manavgup

CHUNKING.md

chunking

Chunking is the process that decides which modules are placed into which bundles, and the relationship between these bundles.

airag

vercel

RETRIEVAL.md

prompt-context-retrieval

title: Just-in-Time Context Retrieval

aiagentprompt

goldk3y

CHUNKING.md

Property chunking

Property chunking enables connectors to handle APIs with limitations on the number of properties that you can fetch per request. This feature breaks down large property lists into smaller, manageable chunks and merges the results back into complete records. Some connectors require this capability to work with APIs that have property limits.

airbytehq

ARCHITECTURE.md

MetaAST-Enhanced Retrieval

MetaAST-enhanced retrieval leverages semantic metadata from the Metastatic analyzer to improve code search accuracy and relevance. This system combines:

airageval

Oeditus

RETRIEVAL.md

Static Hosting Performance

🤔 After trying this it quickly becomes evident that the speed is not satisfactory. Of course we could conclude we need it to be hosted in a assets worker but that would make it way less scalable. There are several other ways to improve speed though, so let's do it.

airageval

janwilmake

RETRIEVAL.md