Chunker Server

Overview

The Chunker MCP Server provides advanced text chunking capabilities with multiple strategies and configurable options. It supports recursive, semantic, sentence-based, fixed-size, and markdown-aware chunking methods to meet different text processing needs. The server is now available in both original MCP and FastMCP implementations, with FastMCP offering enhanced type safety and automatic validation.

Key Features

Multiple Chunking Strategies: Recursive, semantic, sentence-based, fixed-size, markdown-aware
Markdown Support: Intelligent markdown chunking respecting header structure
Configurable Parameters: Chunk size, overlap, separators, and more
Text Analysis: Analyze text to recommend optimal chunking strategy
Library Integration: Supports LangChain text splitters, NLTK, and spaCy
FastMCP Implementation: Modern decorator-based tool definitions with automatic validation

Quick Start

Installation

# Basic installation with core functionality
make install

# With NLP libraries (NLTK and spaCy)
make install-nlp

# With LangChain support
make install-langchain

# Full installation (recommended - includes all features)
make install-full

Running the Server

# FastMCP server (recommended)
make dev-fastmcp

# Original MCP server
make dev

# HTTP bridge for REST API access
make serve-http-fastmcp  # FastMCP version
make serve-http          # Original version

Available Tools

chunk_text

Universal text chunking with multiple strategies.

Parameters:

text (required): Text to chunk
chunk_size: Maximum chunk size (default: 1000, range: 100-100000)
chunk_overlap: Overlap between chunks (default: 200)
chunking_strategy: "recursive", "semantic", "sentence", or "fixed_size"
separators: Custom separators for splitting
preserve_structure: Preserve document structure when possible

chunk_markdown

Markdown-aware chunking that respects header structure.

Parameters:

text (required): Markdown text to chunk
headers_to_split_on: Headers to use as boundaries (default: ["#", "##", "###"])
chunk_size: Maximum chunk size (default: 1000)
chunk_overlap: Overlap between chunks (default: 100)

semantic_chunk

Content-aware chunking based on semantic boundaries.

Parameters:

text (required): Text to chunk
min_chunk_size: Minimum chunk size (default: 200)
max_chunk_size: Maximum chunk size (default: 2000)
similarity_threshold: Threshold for semantic grouping (default: 0.8)

sentence_chunk

Sentence-based chunking with configurable grouping.

Parameters:

text (required): Text to chunk
sentences_per_chunk: Sentences per chunk (default: 5, range: 1-50)
overlap_sentences: Overlapping sentences (default: 1, range: 0-10)

fixed_size_chunk

Fixed-size chunking with word boundary preservation.

Parameters:

text (required): Text to chunk
chunk_size: Fixed chunk size (default: 1000)
overlap: Overlap between chunks (default: 0)
split_on_word_boundary: Avoid breaking words (default: true)

analyze_text

Analyze text characteristics and get chunking recommendations.

Parameters:

text (required): Text to analyze

Returns:

Text statistics (length, word count, paragraph count)
Structure detection (markdown headers, lists, etc.)
Recommended chunking strategies with parameters

get_strategies

Get information about available chunking strategies and libraries.

Returns:

Available strategies and their descriptions
Best use cases for each strategy
Library availability status

Configuration

MCP Client Configuration

FastMCP Server (Recommended)

{
  "mcpServers": {
    "chunker": {
      "command": "python",
      "args": ["-m", "chunker_server.server_fastmcp"]
    }
  }
}

Original Server

{
  "mcpServers": {
    "chunker": {
      "command": "python",
      "args": ["-m", "chunker_server.server"]
    }
  }
}

Examples

Basic Text Chunking

{
  "text": "Your long text here...",
  "chunk_size": 1000,
  "chunk_overlap": 200,
  "chunking_strategy": "recursive"
}

Markdown Documentation Processing

{
  "text": "# API Reference\n\n## Authentication\n\n...",
  "headers_to_split_on": ["#", "##"],
  "chunk_size": 2000,
  "chunk_overlap": 100
}

Semantic Chunking for Articles

{
  "text": "Article content with multiple paragraphs...",
  "min_chunk_size": 500,
  "max_chunk_size": 3000,
  "similarity_threshold": 0.7
}

Preparing Text for Embeddings

{
  "text": "Text to be embedded...",
  "chunk_size": 512,
  "chunk_overlap": 50,
  "chunking_strategy": "recursive"
}

Integration

With ContextForge

To integrate with ContextForge, expose the server over HTTP:

# Start the chunker server via HTTP
make serve-http-fastmcp

# Register with ContextForge
curl -X POST http://localhost:8000/gateways \
  -H "Content-Type: application/json" \
  -d '{
    "name": "chunker-server",
    "url": "http://localhost:9000",
    "description": "Text chunking server"
  }'

Programmatic Usage

import asyncio
from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client

async def chunk_text():
    server_params = StdioServerParameters(
        command="python",
        args=["-m", "chunker_server.server_fastmcp"]
    )

    async with stdio_client(server_params) as (read, write):
        async with ClientSession(read, write) as session:
            # Initialize the client
            await session.initialize()

            # List available tools
            tools = await session.list_tools()

            # Call chunk_text tool
            result = await session.call_tool("chunk_text", {
                "text": "Your text here...",
                "chunk_size": 1000,
                "chunking_strategy": "recursive"
            })

            print(result.content[0].text)

asyncio.run(chunk_text())

Response Format

All tools return a JSON response with:

success: Boolean indicating success/failure
strategy: The chunking strategy used
chunks: Array of text chunks
chunk_count: Number of chunks created
Additional metadata specific to each strategy

Example Response:

{
  "success": true,
  "strategy": "recursive",
  "chunks": [
    "First chunk of text...",
    "Second chunk of text..."
  ],
  "chunk_count": 2,
  "total_length": 2000,
  "average_chunk_size": 1000
}

Chunking Strategies Guide

Recursive Chunking

Best for: General text, mixed content
How it works: Hierarchically splits using multiple separators
Use cases: Books, articles, documentation

Markdown Chunking

Best for: Markdown documents, structured content
How it works: Splits on markdown headers, preserves structure
Use cases: Technical documentation, READMEs, wiki pages

Semantic Chunking

Best for: Articles, essays, narrative text
How it works: Groups content by semantic boundaries
Use cases: Research papers, blog posts, news articles

Sentence Chunking

Best for: Precise sentence-level processing
How it works: Groups sentences with optional overlap
Use cases: Translation, summarization, sentence analysis

Fixed-Size Chunking

Best for: Uniform chunk sizes, simple splitting
How it works: Splits at fixed character counts
Use cases: Token limits, consistent processing windows

Chunker Server

Chunker Server

Overview

Key Features

Quick Start

Installation

Running the Server

Available Tools

chunk_text

chunk_markdown

semantic_chunk

sentence_chunk

fixed_size_chunk

analyze_text

get_strategies

Configuration

MCP Client Configuration

FastMCP Server (Recommended)

Original Server

Examples

Basic Text Chunking

Markdown Documentation Processing

Semantic Chunking for Articles

Preparing Text for Embeddings

Integration

With ContextForge

Programmatic Usage

Response Format

Chunking Strategies Guide

Recursive Chunking

Markdown Chunking

Semantic Chunking

Sentence Chunking

Fixed-Size Chunking

Related Documents

SUMMARY

Retrieval & Prompts

App Review Support Guide — Switch2Go

RFC-BLite: High-Performance Embedded Document Database for .NET