---
title: "Giving AI agents knowledge they were never trained on"
published: true
tags: mcp, ai, typescript, llm
---
I love coding my own stuff, and my clients typically have lots of internal specifications and libraries to use.
But since LLMs haven't been trained on that, it's hard to get them to code accurately using those specs, libraries, or frameworks.
You can, of course, let the agents parse everything, but that wastes tokens and your patience :)
The same goes for well-known libraries, but you are stuck on a specific version that you must follow. You don't want it to guess the API.
`docs-mcpserver` exists to deal with both.
## What it is
It is an MCP server that provides an agent with accurate knowledge of a framework or specification using documentation as the medium. It reads three kinds of docs:
- **Markdown docs** — your `*.md` files.
- **API reference** — C# XML documentation, or TypeDoc JSON.
- **Schema** — JSON Schema, OpenAPI 3.x, Swagger 2.0.
What the agent gets out of it is the same in every case: the real names, the real
signatures, the real shapes. Sources can come from a
local folder or straight from a GitHub URL. A single
server instance can host several libraries side by side. For instance, your in-house framework, a client's framework, and a specific version of some public library.
The agent picks which one to query.
I personally have used it to code against a specification called DATEX (traffic information for roads), which is HUGE, my own [SPA library](https://github.com/relax-js/core), and against sound format specifications for a sound app I'm building.
## Why not just give the agent the files
You could point the agent to the folders and let it read them. The MCP server does a few things that raw file access does not:
- **It is sandboxed.** Each source is scoped, with path-traversal protection. The
agent reads what you exposed, nothing else on the disk.
- **It reads in pieces.** Instead of loading a 4000-line reference file, the agent
asks for the table of contents, then pulls the one chapter it needs.
- **It searches properly.** Dedicated search tools with regex and glob support,
instead of the agent improvising its own grep.
- **It is self-describing.** With several libraries configured, the agent calls
one tool to discover what is available. You do not have to spell out every path.
- **GitHub works without cloning.** Give it a repo URL and it handles the rest.
The multi-library part is the point. Instead of running several MCP servers for
documentation, you get one with a small toolset. No token waste.
## Setting it up
Install and build:
```bash
npm install
npm run build
```
The quick way, a single folder:
```bash
docs-mcpserver ./docs --name "My Docs"
```
For the real use case — several libraries — use a config file. Here is an
in-house framework served from disk, next to a pinned version of a public
library pulled from GitHub:
```json
{
"name": "dev-docs",
"description": "Frameworks the model has not been trained on",
"cacheDir": "./cache",
"libraries": [
{
"name": "acme-core",
"description": "Our internal application framework",
"sources": [
{ "type": "disk", "origin": "./frameworks/acme-core/docs", "kind": "docs" },
{ "type": "disk", "origin": "./frameworks/acme-core/api", "kind": "api" }
]
},
{
"name": "somelib-3.2",
"description": "SomeLib, pinned to v3.2.0",
"sources": [
{
"type": "github",
"origin": "https://github.com/someorg/somelib/tree/v3.2.0/docs",
"kind": "docs"
}
]
}
]
}
```
Start it with the config:
```bash
docs-mcpserver --config dev-docs.json
```
And register it with Claude Code:
```bash
claude mcp add mydocs -- node /path/to/markdown-docs-mcp/dist/index.js --config /path/to/dev-docs.json
```
For private GitHub repos, set `GITHUB_TOKEN` in the environment.
## What the agent actually sees
Each library exposes tools based on the `kind` of its sources:
- **docs** — `get_doc_index`, `get_sub_index`, `read_doc_file`, `get_file_toc`,
`get_chapters`, `search_docs`.
- **api** — `get_api_index`, `get_api_type`, `get_api_member`, `search_api`.
- **schema** — `list_schemas`, `list_definitions`, `get_definition`,
`search_definitions`, `search_all_schemas`.
A typical run looks like this. The agent calls `list_libraries` and sees
`acme-core` and `somelib-3.2`. It needs to know how `acme-core` handles
configuration, so it calls `search_docs` with `library: "acme-core"`, finds the
right file, asks for its table of contents with `get_file_toc`, then pulls the
one relevant section with `get_chapters`. It answers the question without ever
loading the whole file.
When multiple libraries are configured, every tool takes a `library` parameter.
When there is only one, the parameter disappears, and the tools behave like a
plain single-library server.
The same applies to schema sources. For an OpenAPI spec, path operations show up
as definitions named like `GET /pets`, so the agent can ask for one endpoint
without reading the whole document. Useful when you want the agent to call your
API correctly rather than guess at the shape of it.
## Generating the API input
One thing worth knowing up front: the `api` pipeline does not read source code.
It consumes a generated documentation file.
- **TypeScript / JavaScript** — use TypeDoc's JSON serializer:
`typedoc --json api.json src/index.ts`. Point the source at that `.json` file.
The markdown output from `typedoc-plugin-markdown` is not supported — it has to
be the JSON serializer output.
- **C#** — enable `<GenerateDocumentationFile>true</GenerateDocumentationFile>`
and point the source at the generated `*.xml` file, or the build output folder
that contains it.
## What it does not do
It does not read source code. If you want API reference, you generate the doc
file first, as above.
## Try it
The code is on [GitHub](https://github.com/jgauffin/dev-docs-mcp), or on npm as `docs-mcpserver`.
Feel free to leave feedback, or check my other MCP servers on [GitHub](https://github.com/jgauffin).