Knowledge Context Protocol (KCP) Specification

# Knowledge Context Protocol (KCP) Specification **Version:** 0.14 **Status:** Draft **Date:** 2026-03-25 **Repository:** github.com/cantara/knowledge-context-protocol --- ## Abstract The Knowledge Context Protocol (KCP) defines a file format for structured knowledge manifests. A KCP manifest (`knowledge.yaml`) describes the knowledge units in a project — their intent, dependencies, freshness, and audience — in a way that AI agents can navigate without loading everything at once. KCP is a format specification, not a runtime protocol. It requires no server, no database, and no running process. A static site or a git repository can be fully KCP-compliant. --- ## Status of This Document This is a draft specification. The format is intentionally minimal and subject to revision based on implementation feedback. Implementations SHOULD declare which version of this specification they conform to using the `kcp_version` field. --- ## Conformance Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119). --- ## 1. File Discovery ### 1.1 Canonical Location The canonical KCP manifest file is named `knowledge.yaml` and SHOULD be placed at the root of the project or documentation site — the same level as `README.md`, `llms.txt`, or equivalent root files. ### 1.2 Alternative Location via llms.txt If the manifest is not at the root, its location MAY be declared in `llms.txt` using a `knowledge:` metadata line: ``` > knowledge: /docs/knowledge.yaml ``` This line MUST appear in the header section of `llms.txt` (before the first `##` section heading). The value is a path relative to the site or repository root, beginning with `/`. Parsers encountering a `knowledge:` declaration in `llms.txt` SHOULD use that path instead of the default root location. ### 1.3 Multiple Manifests A project MAY contain multiple manifests (e.g. one per subdirectory). Each manifest is independent and MUST NOT reference units from other manifests by path. Cross-manifest federation is supported via the `manifests` block — see §3.6. ### 1.4 Discovery via `/.well-known/kcp.json` An origin server MAY expose a well-known discovery document at `/.well-known/kcp.json` as defined by [RFC 8615](https://datatracker.ietf.org/doc/html/rfc8615). This enables agents and crawlers to locate KCP manifests on any HTTP origin without prior knowledge of the manifest path. A GET request to `/.well-known/kcp.json` SHOULD return a JSON document with `Content-Type: application/json`. The document MUST include: | Field | Type | Description | |-------|------|-------------| | `kcp_version` | string | The KCP specification version this manifest conforms to. | | `manifest` | string | Absolute URL or root-relative path to the `knowledge.yaml` manifest. | The document MAY include: | Field | Type | Description | |-------|------|-------------| | `title` | string | Human-readable name of the project or knowledge base. | | `description` | string | Brief summary of the knowledge available. | | `spec` | string | URL of the KCP specification document. | | `network` | object | Network topology hint. Fields: `role` (`hub`\|`leaf`\|`standalone`), `entry_point`, `registry_label`. See §3.7. | Example: ```json { "kcp_version": "0.12", "manifest": "/knowledge.yaml", "title": "My Project Knowledge Base", "description": "Architecture decisions, API reference, and onboarding guides.", "spec": "https://github.com/Cantara/knowledge-context-protocol" } ``` Agents encountering `/.well-known/kcp.json` SHOULD fetch the `manifest` URL to retrieve the full KCP manifest. Agents that successfully retrieve a manifest via this mechanism MUST NOT require the manifest to also be present at the repository root. This discovery path complements §1.1 (root placement) and §1.2 (llms.txt declaration). An origin MAY support all three; agents SHOULD prefer `/.well-known/kcp.json` when performing HTTP-based discovery on a live site. ### 1.5 Distribution via Catalog A `catalog.yaml` file MAY be used to declare, version, and distribute collections of KCP manifests as static artifacts — independently of any running service. The catalog format is defined in the companion specification: [CATALOG-SPEC.md](./CATALOG-SPEC.md). A catalog entry's `source` field references a `knowledge.yaml` file. Every manifest referenced by a catalog entry MUST conform to this specification. Implementations that support catalog-based distribution SHOULD resolve catalog entries before federation (§3.6): **distribution is the write path; federation is the read path.** A manifest installed via a catalog MAY also declare `manifests[]` federation links — the two mechanisms compose without conflict. Well-known catalog locations, in order of preference: 1. `.kcp/catalog.yaml` — project-local, co-located with source code 2. `~/.kcp/catalog.yaml` — user-global, for practitioner rigs and tooling 3. Any path passed explicitly via tooling configuration --- ## 2. File Format KCP manifests MUST be valid YAML 1.2. The file MUST be UTF-8 encoded without a BOM. Parsers MUST silently ignore fields they do not recognise. This ensures forward compatibility: a manifest valid for a future version of the spec remains parseable by implementations of this version. --- ## 3. Root Manifest Structure ```yaml kcp_version: "0.12" # RECOMMENDED project: <string> # REQUIRED version: <semver string> # RECOMMENDED updated: "<ISO date>" # RECOMMENDED; quote the value (see §4.1.1) language: <BCP 47 tag> # OPTIONAL; default language for all units (see §4.4c) license: <string or object> # OPTIONAL; default license for all units (see §4.6a) indexing: <string or object> # OPTIONAL; default indexing permissions (see §4.6c) hints: <object> # OPTIONAL; manifest-level aggregate hints (see §4.10) trust: <object> # OPTIONAL; publisher provenance and audit requirements (see §3.2) auth: <object> # OPTIONAL; authentication methods for this knowledge source (see §3.3) delegation: <object> # OPTIONAL; delegation chain constraints for multi-agent access (see §3.4) compliance: <object> # OPTIONAL; compliance classification and processing restrictions (see §3.5) payment: <object> # OPTIONAL; default monetisation tier for all units (see §4.14) manifests: <list> # OPTIONAL; federation declarations (see §3.6) external_relationships: <list> # OPTIONAL; cross-manifest relationship declarations (see §3.6) visibility: <object> # OPTIONAL; manifest-wide visibility default for all units (see §3.8) authority: <object> # OPTIONAL; manifest-wide authority default for all units (see §3.8) discovery: <object> # OPTIONAL; manifest-wide discovery defaults for all units (see §3.9) units: # REQUIRED; list of knowledge units - ... relationships: # OPTIONAL; list of cross-unit relationship declarations - ... ``` ### 3.1 Root Fields | Field | Required | Type | Description | |-------|----------|------|-------------| | `kcp_version` | RECOMMENDED | string | Version of this specification. MUST be `"0.12"` for conformance with this document. | | `project` | REQUIRED | string | Human-readable name of the project or documentation site. | | `version` | RECOMMENDED | string | Semver version of this manifest. Increment when units are added or removed. | | `updated` | RECOMMENDED | string | ISO 8601 date (`YYYY-MM-DD`) when this manifest was last modified. | | `language` | OPTIONAL | string | BCP 47 language tag as default for all units. See §4.4c. | | `license` | OPTIONAL | string or object | Default license for all units. See §4.6a. | | `indexing` | OPTIONAL | string or object | Default indexing permissions for all units. See §4.6c. | | `hints` | OPTIONAL | object | Manifest-level aggregate context hints. See §4.10. | | `trust` | OPTIONAL | object | Publisher provenance and audit requirements for this manifest. See §3.2. | | `auth` | OPTIONAL | object | Authentication methods for this knowledge source. See §3.3. | | `delegation` | OPTIONAL | object | Delegation chain constraints for multi-agent access. See §3.4. | | `compliance` | OPTIONAL | object | Compliance classification, data residency, and processing restrictions. See §3.5. | | `payment` | OPTIONAL | object | Default monetisation tier for all units. See §4.14. | | `manifests` | OPTIONAL | list | Federation declarations — sub-manifests this manifest has a relationship with. See §3.6. | | `external_relationships` | OPTIONAL | list | Cross-manifest relationship declarations. See §3.6. | | `freshness_policy` | OPTIONAL | object | Default staleness policy for all units. Unit-level declarations override. See §3.7. | | `visibility` | OPTIONAL | object | Manifest-wide visibility default. Units without their own `visibility` block inherit this. See §3.8. | | `authority` | OPTIONAL | object | Manifest-wide authority default. Units without their own `authority` block inherit this. See §3.8. | | `discovery` | OPTIONAL | object | Manifest-wide discovery defaults. Units inherit fields not declared at unit level. See §3.9. | | `units` | REQUIRED | list | Ordered list of knowledge unit declarations. MUST contain at least one unit. | | `relationships` | OPTIONAL | list | Explicit cross-unit relationship declarations. See §5. | ### 3.2 `trust` The root-level `trust` block declares the provenance of this manifest — who published it and how to contact them — and what audit behaviour is expected from agents that access it. It is advisory metadata: it carries no cryptographic weight unless combined with external signing infrastructure (see §14.1). ```yaml trust: provenance: publisher: "Acme Corp" publisher_url: "https://acme.com" contact: "[email protected]" audit: agent_must_log: true require_trace_context: true ``` #### `trust.provenance` sub-fields | Field | Required | Type | Description | |-------|----------|------|-------------| | `publisher` | OPTIONAL | string | Human-readable name of the publishing organisation or individual. | | `publisher_url` | OPTIONAL | string | URL of the publisher's web presence. MUST use HTTPS if present. | | `contact` | OPTIONAL | string | Email address or URL for questions about this manifest's content. | #### `trust.audit` sub-fields | Field | Required | Type | Description | |-------|----------|------|-------------| | `agent_must_log` | OPTIONAL | boolean | Advisory: agents SHOULD record access to this manifest's units in their own audit trail. Default: `false`. | | `require_trace_context` | OPTIONAL | boolean | If `true`: agents MUST include [W3C Trace Context](https://www.w3.org/TR/trace-context/) headers (`traceparent`, `tracestate`) when fetching content via HTTP, so the full access chain can be reconstructed. Compatible with [OpenTelemetry](https://opentelemetry.io/). Default: `false`. | **`require_trace_context` and local file access:** KCP manifests are frequently consumed from local file systems or git repositories where no HTTP request occurs. When content is accessed locally rather than via HTTP, the `traceparent`/`tracestate` header semantics do not apply. In this case, agents SHOULD generate a `traceparent` value conforming to the W3C Trace Context specification and record it alongside the access event in their own audit trail. The intent is the same — a reconstructable trace of which agent accessed which unit and when — regardless of whether the access occurred over HTTP or via the local file system. All sub-fields of `trust` are OPTIONAL. An empty `trust` block (no sub-fields) is valid and SHOULD be silently accepted. Unknown sub-fields MUST be silently ignored. Cryptographic content integrity, access receipts, and agent attestation requirements are defined in [RFC-0004](./RFC-0004-Trust-and-Compliance.md) and may be promoted to the core spec in a future version. ### 3.3 `auth` The root-level `auth` block describes how to authenticate to the knowledge source. It is relevant when the manifest is served via a KCP-aware MCP server, an HTTP endpoint, or a federated discovery registry. For manifests served from public git repositories or static sites with no access control, the `auth` block MAY be omitted entirely. ```yaml auth: methods: - type: oauth2 issuer: "https://auth.example.com" scopes: ["read:knowledge"] - type: api_key header: "X-API-Key" registration_url: "https://example.com/register" - type: none ``` #### How agents use the `auth` block 1. Agent discovers a manifest and inspects units for `access` values (see §4.11). 2. Any unit with `access: authenticated` or `access: restricted` triggers credential acquisition. 3. Agent reads `auth.methods` to determine how to acquire credentials. 4. Agent selects the first method it supports and proceeds. 5. If no supported method exists, the agent SHOULD surface this to its operator rather than silently failing. #### `auth.methods` The `methods` list declares one or more authentication schemes supported by this knowledge source, in preference order. Agents try methods in list order until one succeeds. This enables graceful degradation across environments (e.g. prefer OAuth 2.1 but accept API key). Each entry MUST have a `type` field. The following types are defined in this version of the specification: | Type | Description | Sub-fields | |------|-------------|------------| | `none` | No credentials required. | None. | | `oauth2` | OAuth 2.1 authentication. | `issuer`, `scopes`, `registration_url` | | `api_key` | API key passed in a named HTTP header. | `header`, `registration_url` | Unknown `type` values MUST be silently ignored by parsers. This enables forward compatibility with additional auth types defined in [RFC-0002](./RFC-0002-Auth-and-Delegation.md) (e.g. `spiffe`, `did`, `bearer_token`, `http_signature`) without requiring core spec changes. ##### `type: none` Declares that the knowledge source accepts unauthenticated access. When `none` appears in a list alongside other methods, it serves as a fallback — agents that cannot satisfy any other method may access content without credentials. **Interaction with `access`:** Units with `access: public` are implicitly accessible via `type: none`. Units with `access: authenticated` or `access: restricted` require a non-`none` method; the presence of `type: none` in the methods list does not satisfy their access requirement. In practice, manifests with mixed access levels (some public, some restricted) will typically list `type: none` as the last method to signal that the public units are freely accessible. ##### `type: oauth2` Declares that the knowledge source supports [OAuth 2.1](https://datatracker.ietf.org/doc/html/draft-ietf-oauth-v2-1-12) authentication. This is the baseline auth mechanism — consistent with [MCP's authorization specification](https://modelcontextprotocol.io/specification/draft/basic/authorization), which also requires OAuth 2.1. | Sub-field | Required | Description | |-----------|----------|-------------| | `issuer` | REQUIRED | OAuth 2.1 issuer URL. Agents can discover endpoints via `{issuer}/.well-known/oauth-authorization-server` ([RFC 8414](https://datatracker.ietf.org/doc/html/rfc8414)). MUST use HTTPS. | | `scopes` | OPTIONAL | List of OAuth scopes to request. | | `registration_url` | OPTIONAL | URL where agents or operators can register for credentials. | ##### `type: api_key` Declares that the knowledge source accepts an API key passed in a named HTTP header. This is a simpler alternative to OAuth for knowledge sources that do not require delegated authorization. | Sub-field | Required | Description | |-----------|----------|-------------| | `header` | REQUIRED | Name of the HTTP header that carries the API key (e.g. `"X-API-Key"`, `"Authorization"`). | | `registration_url` | OPTIONAL | URL where agents or operators can register for an API key. | #### `auth` block conformance - `auth` is OPTIONAL. Omitting it means no authentication metadata is declared. - Parsers MUST NOT reject a manifest because `auth` is absent, even if units declare `access: authenticated` or `access: restricted`. - When `auth` is present but no method in `auth.methods` is recognised by the parser, the parser SHOULD emit a warning and MUST NOT reject the manifest. - Unknown sub-fields within any method entry MUST be silently ignored. --- ### 3.4 `delegation` The root-level `delegation` block declares constraints on how knowledge units in this manifest may be accessed through agent delegation chains. Multi-agent systems commonly involve a human authorising an orchestrator, which delegates to sub-agents, which may further delegate. This block limits how far that chain may extend and what conditions it must satisfy. ```yaml delegation: max_depth: 3 # OPTIONAL; maximum hops from resource owner to accessing agent require_capability_attenuation: true # OPTIONAL; each hop MUST narrow permissions, not expand them require_delegation_proof: false # OPTIONAL; agent MUST present a verifiable chain audit_chain: true # OPTIONAL; W3C Trace Context required on all access requests human_in_the_loop: # OPTIONAL; root-level human approval default required: false approval_mechanism: oauth_consent # uma | oauth_consent | custom docs_url: "https://..." # required when approval_mechanism is "custom" ``` #### `delegation` field reference | Field | Required | Type | Description | |-------|----------|------|-------------| | `max_depth` | OPTIONAL | integer | Maximum delegation chain depth. **The resource owner operates at depth 0.** The first agent to which access is delegated operates at depth 1. `max_depth: 0` means no delegation is permitted — only the resource owner may access the unit directly. Omit for no constraint. | | `require_capability_attenuation` | OPTIONAL | boolean | If `true`, each hop in the delegation chain MUST present narrower permissions than the delegating agent held. Agents SHOULD reject delegation tokens that grant equal or greater scope than the parent token. | | `audit_chain` | OPTIONAL | boolean | If `true`, agents MUST include W3C Trace Context `traceparent`/`tracestate` headers on all access requests, enabling full delegation chain reconstruction from access logs. Compatible with OpenTelemetry. | | `human_in_the_loop` | OPTIONAL | object | Root-level human approval requirement. May be overridden at unit level. | #### `delegation.human_in_the_loop` sub-fields | Sub-field | Required | Description | |-----------|----------|-------------| | `required` | OPTIONAL | If `true`, a human MUST approve agent access before the unit may be loaded. Default: `false`. | | `approval_mechanism` | OPTIONAL | How human approval is obtained: `oauth_consent` (OAuth 2.1 authorization code flow with user present), `uma` (UMA 2.0 asynchronous resource owner policy), or `custom` (described at `docs_url`). | | `docs_url` | OPTIONAL | URL describing the approval flow. REQUIRED when `approval_mechanism` is `custom`. | #### Per-unit `delegation` overrides Individual units MAY declare a `delegation` block to tighten constraints beyond the root defaults. A per-unit `delegation` block MUST NOT relax root constraints (e.g. `max_depth` at unit level MUST NOT exceed `max_depth` at root level). ```yaml units: - id: patient-records path: data/patients.md access: restricted auth_scope: clinical-staff delegation: max_depth: 1 human_in_the_loop: required: true approval_mechanism: oauth_consent ``` #### Security properties | Attack class | Mitigation | |--------------|-----------| | Agent session smuggling | `max_depth` limits the delegation chain length | | Cross-agent privilege escalation | `require_capability_attenuation` | | Unauthorised autonomous access | `human_in_the_loop.required: true` | | Audit evasion | `audit_chain` + W3C Trace Context | #### Known limitations (v0.7) **Capability attenuation is declarative.** `require_capability_attenuation: true` declares that scope narrowing MUST occur but does not define a scope comparison function. Implementations SHOULD treat OAuth scope strings as a hierarchy where a more specific scope (e.g. `read:case:external-summary`) is a narrowing of its prefix (e.g. `read:case`). A formal scope comparison specification is planned for a future version. **Delegation chain integrity requires signed tokens.** Without signed lineage tokens, a misconfigured or malicious agent may claim any delegation depth. A `require_delegation_proof` field (reserved for future use) is specified in RFC-0002 for implementations that issue signed delegation credentials; the token format and normative field definition are deferred to v0.9+. #### `delegation` block conformance - `delegation` is OPTIONAL at both root and unit level. - Parsers MUST NOT reject a manifest because `delegation` is absent. - Parsers MUST silently ignore unrecognised sub-fields. - When a per-unit `max_depth` is present, it MUST be ≤ the root `max_depth` if a root value is set. - `kcp_version: "0.7"` manifests MUST NOT fail validation if `delegation` is present; parsers that do not recognise `delegation` MUST silently ignore it. --- ### 3.5 `compliance` The root-level `compliance` block declares the regulatory context, data residency requirements, sensitivity classification, and processing restrictions that apply to this manifest. It is advisory metadata: compliance enforcement depends on the consuming agent and its runtime environment. Agents that cannot satisfy the compliance requirements declared for a unit SHOULD NOT access that unit and SHOULD surface the constraint to their operator. ```yaml compliance: data_residency: [EEA] # permitted geographic regions (ISO 3166-1 alpha-2 or region identifiers) regulations: [GDPR, NIS2] # applicable regulations (see regulation vocabulary) sensitivity: internal # public | internal | confidential | restricted restrictions: # processing restrictions (see restriction vocabulary) - no-external-llm - audit-required ``` #### `compliance` field reference | Field | Required | Type | Description | |-------|----------|------|-------------| | `data_residency` | OPTIONAL | list | Permitted geographic regions for data storage and processing. ISO 3166-1 alpha-2 codes or region identifiers (e.g. `EEA`, `US`, `CH`). See region vocabulary. | | `regulations` | OPTIONAL | list | Named regulations that apply to this knowledge. Unknown values MUST be silently ignored. | | `sensitivity` | OPTIONAL | string | Information sensitivity level. Overrides `sensitivity` declared at the unit level (§4.11) when used in a root compliance block; at unit level they are equivalent. | | `restrictions` | OPTIONAL | list | Processing restrictions. Unknown values MUST be silently ignored. | #### Sensitivity levels Aligned with ISO 27001 and common national information classification frameworks: | Value | Meaning | |-------|---------| | `public` | No restrictions. Freely shareable. | | `internal` | For internal use only. Not for external parties without authorisation. | | `confidential` | Restricted within the organisation. Need-to-know basis. | | `restricted` | Highest sensitivity. Strict access controls required. | #### Regulation vocabulary Implementations SHOULD recognise the following named values: | Value | Regulation | |-------|-----------| | `GDPR` | EU General Data Protection Regulation | | `ePrivacy` | EU ePrivacy Directive / Regulation | | `NIS2` | EU Network and Information Security Directive 2 | | `HIPAA` | US Health Insurance Portability and Accountability Act | | `HITECH` | US Health Information Technology for Economic and Clinical Health Act | | `CCPA` | California Consumer Privacy Act | | `MiFID2` | EU Markets in Financial Instruments Directive II | | `DORA` | EU Digital Operational Resilience Act (financial sector) | | `AML5D` | EU 5th Anti-Money Laundering Directive | | `FATF` | Financial Action Task Force recommendations | | `ITAR` | US International Traffic in Arms Regulations | | `eIDAS` | EU Electronic Identification, Authentication and Trust Services | Unknown values MUST be silently ignored. #### Restriction vocabulary | Value | Meaning | |-------|---------| | `no-external-llm` | Knowledge MUST NOT be sent to an externally-hosted language model. | | `no-logging` | Knowledge MUST NOT be written to persistent logs. | | `no-cross-border` | Knowledge MUST NOT leave the declared `data_residency` regions. | | `no-ai-training` | Knowledge MUST NOT be used to train or fine-tune AI models. | | `audit-required` | All access MUST be logged with sufficient detail for compliance audit. | | `human-approval-required` | A human MUST approve access before the unit is loaded. Connects to `delegation.human_in_the_loop`. | Unknown values MUST be silently ignored. #### Per-unit `compliance` overrides Units MAY declare their own `compliance` block to tighten or loosen the root defaults for that unit specifically: ```yaml units: - id: customer-data path: data/customers.md compliance: data_residency: [EEA] regulations: [GDPR, ePrivacy] sensitivity: confidential restrictions: - no-external-llm - audit-required - id: public-overview path: README.md compliance: sensitivity: public # override: less restrictive than root default ``` #### Known limitations (v0.7) **Restriction enforcement is application-defined.** The restriction vocabulary declares intent. There is no standard technical mechanism to verify that a `no-ai-training` restriction has been honoured by the consuming agent. Agents SHOULD treat restrictions as contractual commitments and record them in audit logs as evidence of acknowledged obligation. **Data residency evaluation requires runtime context.** A parser cannot determine the geographic location of the consuming agent at parse time. Agents SHOULD evaluate `data_residency` against their deployment context and decline to access units they cannot process in compliance. #### `compliance` block conformance - `compliance` is OPTIONAL at both root and unit level. - Parsers MUST NOT reject a manifest because `compliance` is absent. - Parsers MUST silently ignore unrecognised sub-fields and unknown list values. - Per-unit `compliance` values override root `compliance` for that unit only. - `kcp_version: "0.7"` manifests MUST NOT fail validation if `compliance` is present; parsers that do not recognise `compliance` MUST silently ignore it. ### 3.6 Federation The optional `manifests` block declares external KCP manifests that this manifest has a relationship with. This enables cross-manifest dependency tracking and federated knowledge graphs. #### `manifests` block ```yaml manifests: - id: platform url: "https://platform-team.example.com/knowledge.yaml" label: "Platform Engineering" relationship: foundation update_frequency: weekly local_mirror: "./mirrors/platform-knowledge.yaml" ``` #### `manifests` entry fields | Field | Required | Type | Description | |-------|----------|------|-------------| | `id` | REQUIRED | string | Local identifier. MUST match `^[a-z0-9.\-]+$`. MUST be unique within this manifest's `manifests` block. | | `url` | REQUIRED | string | HTTPS URL of the remote `knowledge.yaml`. MUST use HTTPS. MUST NOT resolve to private address ranges (§14.3). | | `label` | RECOMMENDED | string | Human-readable description. | | `relationship` | RECOMMENDED | string | How this sub-manifest relates to the declaring manifest. See values below. | | `auth` | OPTIONAL | object | Auth block (per §3.3) for fetching this specific manifest. Overrides root `auth` block for this fetch. | | `update_frequency` | OPTIONAL | string | How often this remote manifest typically changes. Uses the §4.6b vocabulary: `daily`, `weekly`, `monthly`, `rarely`, `never`. Agents MAY use this for cache freshness decisions. | | `local_mirror` | OPTIONAL | string | Relative path (forward slashes, relative to this manifest) to a local copy of the remote manifest. When present and the file exists, parsers MUST load from that path instead of fetching `url`. | | `version_pin` | OPTIONAL | string | Semver version to pin this sub-manifest to. When present, validators SHOULD compare against the remote manifest's `version` field. See version pinning below. | | `version_policy` | OPTIONAL | string | How to interpret `version_pin`. Values: `exact` (versions must be equal), `minimum` (remote version >= pin), `compatible` (same major version, default). Unknown values MUST be treated as `compatible`. | #### `manifests[].relationship` values | Value | Meaning | |-------|---------| | `child` | Sub-manifest depends on the declaring manifest's context. | | `foundation` | Sub-manifest provides foundational knowledge the declaring manifest builds on. | | `governs` | Sub-manifest contains authoritative policies that govern the declaring manifest. | | `peer` | Sub-manifest is at the same level; the relationship is symmetric. | | `archive` | Sub-manifest is historical. Agents MAY skip unless specifically requested. | Unknown `relationship` values MUST be silently ignored. #### Transitive resolution A manifest declared in a `manifests` block MAY itself contain a `manifests` block. Parsers MUST resolve the full transitive graph of `manifests` declarations. **Topology:** DAG with local authority. Each manifest is authoritative only over the sub-manifests it directly declares. Trust does not propagate transitively. **Cycle detection:** Parsers MUST maintain a visited set of resolved manifest URLs across the entire resolution session. A manifest URL already in the visited set MUST NOT be fetched again. This handles both cycles (A -> B -> A) and diamonds (A -> B, A -> C, B -> D, C -> D) correctly — D is fetched once. **Fetch limits:** - Parsers MUST enforce a maximum of unique manifests per session. RECOMMENDED default: 50. - Parsers SHOULD emit a warning when this limit is reached. - Remote manifests larger than 1 MB SHOULD be rejected with a warning. - Remote manifests containing more than 10,000 units SHOULD be rejected with a warning. **Fetch timeout:** Parsers MUST enforce a timeout on remote manifest fetches. RECOMMENDED default: 10 seconds. A timed-out fetch is treated as a network error. **Local mirror resolution order:** 1. If `local_mirror` is present and the referenced file exists, the parser MUST load from that path. `url` is NOT fetched. 2. If `local_mirror` is absent or the file does not exist, the parser SHOULD fetch `url`. 3. If the URL fetch fails, apply `on_failure` behaviour from `external_depends_on` entries that reference this manifest. #### `external_depends_on` (unit-level) A unit may declare cross-manifest dependencies: ```yaml units: - id: data-handling path: compliance/data.md intent: "How does this service handle personal data?" external_depends_on: - manifest: security # references manifests[].id unit: gdpr-policy # unit id in the remote manifest on_failure: degrade # skip | warn | degrade (default: skip) ``` #### `external_depends_on` entry fields | Field | Required | Type | Description | |-------|----------|------|-------------| | `manifest` | REQUIRED | string | The `id` of an entry in this manifest's `manifests` block. Unknown IDs MUST produce a validation warning. | | `unit` | REQUIRED | string | The `id` of a unit in the referenced manifest. Advisory at parse time — existence cannot be verified without fetching. | | `on_failure` | OPTIONAL | string | Agent behaviour when the external unit cannot be resolved. `skip` (silently ignore, default), `warn` (emit a warning to the operator), `degrade` (agent MUST indicate output is operating with incomplete dependencies). Unknown values MUST be treated as `skip`. | #### `external_relationships` (root-level) Explicit typed relationships between units across manifest boundaries: ```yaml external_relationships: - from_manifest: security # OPTIONAL — omit = this manifest from_unit: gdpr-policy # REQUIRED to_unit: data-handling # REQUIRED type: governs # same vocabulary as §5 ``` #### `external_relationships` entry fields | Field | Required | Type | Description | |-------|----------|------|-------------| | `from_manifest` | OPTIONAL | string | Source manifest `id`. Omit = this manifest. | | `from_unit` | REQUIRED | string | Source unit `id`. | | `to_manifest` | OPTIONAL | string | Target manifest `id`. Omit = this manifest. | | `to_unit` | REQUIRED | string | Target unit `id`. | | `type` | REQUIRED | string | Relationship type. Same vocabulary as §5. Unknown types MUST be silently ignored. | At least one of `from_manifest` or `to_manifest` SHOULD be present (otherwise the relationship belongs in `relationships`, not `external_relationships`). #### Authority model Each manifest is authoritative only over the sub-manifests it directly declares. A manifest does NOT inherit authority over sub-manifests declared by its transitive dependencies. Trust propagation across transitive boundaries is an agent-level policy decision, outside the scope of this specification. #### Self-contained manifests A manifest with a `manifests` block MUST remain valid and parseable when loaded in isolation without fetching any remote manifests. The `manifests` block is metadata for federation-capable tools; tools that do not support federation MUST silently ignore it (per §2, forward compatibility). #### Version pinning A `manifests` entry MAY declare a `version_pin` to express a version expectation for the remote manifest. When `version_pin` is present, validators SHOULD compare the pinned version against the remote manifest's `version` field using the `version_policy` semantics. **`version_policy` values:** | Policy | Meaning | |--------|---------| | `exact` | Remote `version` MUST equal `version_pin`. | | `minimum` | Remote `version` MUST be >= `version_pin` (semver comparison). | | `compatible` | Remote major version must match `version_pin`'s major version. E.g., pin `2.1.0` is compatible with `2.3.0` but not `3.0.0`. This is the default. | **Behaviour:** - Version pin mismatch produces a WARNING, never an error. KCP is advisory — version pins are signals, not enforcement mechanisms. - When `local_mirror` is present and the file exists, `local_mirror` takes precedence over version checking. The pinned version is still checked against the local mirror's `version` field. - Validators that do not fetch remote manifests MAY skip version pin checking and SHOULD emit an informational message that version pins were not verified. ```yaml manifests: - id: platform url: "https://platform-team.example.com/knowledge.yaml" relationship: foundation version_pin: "2.1.0" version_policy: compatible # 2.x matches, 3.x does not ``` #### Known limitations (0.10.0) - **Peer-to-peer without a declaring manifest**: Any manifest can declare sub-manifests, but a manifest cannot reference another without one of them declaring the relationship. Arbitrary undeclared cross-referencing is not supported. --- ### 3.7 Agent Readiness (v0.11) v0.11 adds two fields that give agents pre-invocation signals about freshness and capabilities. #### `freshness_policy` block MAY appear at root level (default for all units) or unit level (override). Fields: | Field | Required | Type | Description | |-------|----------|------|-------------| | `max_age_days` | OPTIONAL | integer | Days since `validated` after which the unit is stale. | | `on_stale` | OPTIONAL | string | `warn` (default) \| `degrade` \| `block`. Advisory action when stale. | | `review_contact` | OPTIONAL | string | Email or URL for requesting re-validation. | Unit-level `freshness_policy` fully replaces the root default — no field-level merge. If `validated` is absent, agents MUST NOT treat the unit as stale. #### `requires_capabilities` (unit-level) OPTIONAL list of strings naming capabilities the consuming agent SHOULD possess. Recommended prefix convention: `tool:kubectl`, `permission:deploy-prod`, `role:security-reviewer`. Bare strings are also valid. Parsers MUST NOT reject manifests for unknown capability values. #### `network` in `/.well-known/kcp.json` The discovery document (§1.4) MAY include a `network` object: | Field | Description | |-------|-------------| | `role` | `hub` \| `leaf` \| `standalone` (default). | | `entry_point` | Root-relative path or URL to the hub manifest (meaningful when `role: leaf`). | | `registry_label` | Human-readable name for the network as a whole. | `kcp init` generates `/.well-known/kcp.json` with `network.role: standalone` and prints a suggested `llms.txt` snippet (`> knowledge: /knowledge.yaml`) for the operator to add. --- ### 3.8 Governance Defaults (v0.12) v0.12 adds two optional root-level blocks that declare manifest-wide defaults for access governance (`visibility`) and action permissions (`authority`). Both blocks may also be declared on individual units to override the root default for that unit. #### Root-level `visibility` A root-level `visibility` block sets the manifest-wide default sensitivity level. All units that do not declare their own `visibility` block inherit this default. ```yaml kcp_version: "0.12" project: enterprise-platform visibility: default: internal # all units are internal unless overridden at unit level ``` Unit-level `visibility` blocks fully replace the root default for that unit — no field-level merge. See §4.16 for the complete `visibility` field reference, condition evaluation semantics, and examples. #### Root-level `authority` A root-level `authority` block declares manifest-wide action permission defaults. All units that do not declare their own `authority` block inherit this default. ```yaml kcp_version: "0.12" project: regulated-platform authority: read: initiative summarize: initiative modify: requires_approval # all modifications require approval by default share_externally: denied execute: denied ``` Unit-level `authority` blocks fully replace the root default — no action-level merge across root and unit. See §4.17 for the complete `authority` field reference and safe defaults. --- ### 3.9 Discovery Defaults (v0.12) A root-level `discovery` block declares manifest-wide defaults for the `source` and `verification_status` fields. Unit-level `discovery` blocks inherit root fields they do not override. ```yaml kcp_version: "0.12" project: acme-hr-portal-discovered discovery: source: web_traversal verification_status: observed observed_at: "2026-03-10T14:00:00Z" ``` See §4.18 for the complete `discovery` field reference, verification vocabulary, and examples. Unit-level `discovery` fields override root defaults field-by-field. --- ## 4. Knowledge Units Each entry in `units` describes a self-contained piece of knowledge. ### 4.1 Unit Fields | Field | Required | Type | Description | |-------|----------|------|-------------| | `id` | REQUIRED | string | Unique identifier within this manifest. See §4.2. | | `path` | REQUIRED | string | Relative path to the content file. See §4.3. | | `kind` | OPTIONAL | string | Type of artifact. One of: `knowledge`, `schema`, `service`, `policy`, `executable`. See §4.3a. Default: `knowledge`. | | `intent` | REQUIRED | string | One sentence: what question does this unit answer? See §4.4. | | `format` | OPTIONAL | string | Content format of the referenced file. See §4.4a. | | `content_type` | OPTIONAL | string | MIME type for precise format identification. See §4.4b. | | `language` | OPTIONAL | string | BCP 47 language tag. See §4.4c. | | `scope` | REQUIRED | string | Breadth of applicability. One of: `global`, `project`, `module`. | | `audience` | REQUIRED | list of strings | Who this unit is for. See §4.6. | | `license` | OPTIONAL | string or object | SPDX identifier or structured license metadata. See §4.6a. | | `validated` | RECOMMENDED | string | ISO 8601 date (quoted) when a human last confirmed the content was accurate. See §4.1.1. | | `update_frequency` | OPTIONAL | string | How often this content typically changes. See §4.6b. | | `indexing` | OPTIONAL | string or object | AI crawling and indexing permissions. See §4.6c. | | `depends_on` | OPTIONAL | list of strings | IDs of units that SHOULD be loaded before this one. See §4.7. | | `supersedes` | OPTIONAL | string | ID of the unit this replaces. See §4.8. | | `triggers` | OPTIONAL | list of strings | Keywords or task contexts that make this unit relevant. See §4.9. | | `hints` | OPTIONAL | object | Advisory context window hints: size, loading strategy, and summary relationships. See §4.10. | | `access` | OPTIONAL | string | Who can fetch this unit's content. One of: `public`, `authenticated`, `restricted`. Default: `public`. See §4.11. | | `auth_scope` | OPTIONAL | string | Named scope, role, or group required when `access` is `restricted`. See §4.11. | | `sensitivity` | OPTIONAL | string | Information classification level. One of: `public`, `internal`, `confidential`, `restricted`. See §4.12. | | `deprecated` | OPTIONAL | boolean | If `true`, this unit is present but should not be used for new development. See §4.13. | | `payment` | OPTIONAL | object | Monetisation tier for this unit. Overrides root-level `payment` default. See §4.14. | | `requires_capabilities` | OPTIONAL | list of strings | Capabilities the consuming agent SHOULD possess to act on this unit. See §3.7. | | `freshness_policy` | OPTIONAL | object | Staleness policy for this unit. Overrides root-level `freshness_policy` default. See §3.7. | | `visibility` | OPTIONAL | object | Conditional access by environment or agent role. Overrides root-level `visibility` default. See §4.16. | | `authority` | OPTIONAL | object | Action permission declarations for this unit. Overrides root-level `authority` default. See §4.17. | | `discovery` | OPTIONAL | object | Provenance of how this capability was discovered and how confidently. Inherits root-level `discovery` defaults. See §4.18. | #### 4.1.1 Date Fields The `validated` (unit) and `updated` (root) fields MUST contain an ISO 8601 date string in `YYYY-MM-DD` format. **YAML encoding:** Date values SHOULD be quoted in YAML to prevent YAML 1.1 parsers from coercing them to native date objects: ```yaml validated: "2026-02-25" # correct — stays a string validated: 2026-02-25 # may be parsed as a date object by some YAML libraries ``` Parsers SHOULD coerce native date objects to their ISO 8601 string representation when reading `validated` and `updated` fields, and MUST NOT reject a manifest solely because a date field was parsed as a native date type. ### 4.2 `id` The `id` MUST be unique within the manifest. It MUST contain only lowercase ASCII letters (`a-z`), digits (`0-9`), hyphens (`-`), and dots (`.`). It MUST NOT be empty. ```yaml id: api-authentication-guide ``` ### 4.3 `path` The `path` MUST be a relative path from the manifest file's location to the content file. Forward slashes MUST be used as path separators regardless of operating system. ```yaml path: docs/authentication/guide.md ``` Parsers SHOULD warn if a declared `path` does not exist, but MUST NOT treat a missing file as a parse error. The manifest may describe knowledge that has not yet been created. ### 4.3a `kind` The `kind` field declares what type of artifact a knowledge unit represents. It provides a machine-readable dispatch signal that tells agents how to interact with the unit: load and embed (knowledge), parse as a structured definition (schema), invoke via protocol (service), evaluate as a gate (policy), or run on demand (executable). | Value | Meaning | Agent behaviour | |-------|---------|-----------------| | `knowledge` | Documentation, guides, explanations, prose | Load and embed in context (default) | | `schema` | Machine-readable definitions: OpenAPI, AsyncAPI, gRPC proto, JSON Schema | Parse as structured definition | | `service` | A running or callable endpoint: API, MCP server, webhook | Invoke via protocol | | `policy` | Rules, constraints, compliance documents | Evaluate as authoritative gate | | `executable` | Runnable artifacts: scripts, notebooks, workflow definitions | Invoke on demand | If `kind` is omitted, parsers MUST treat the unit as `kind: knowledge`. This ensures full backward compatibility with v0.2 manifests. Unknown `kind` values MUST be silently ignored by parsers. ```yaml kind: schema ``` ### 4.4 `intent` The `intent` MUST be a single sentence describing the question this unit answers or the task it enables. It SHOULD be written in the form of a question or task description rather than a title. ```yaml intent: "How do I authenticate API requests using OAuth 2.0?" ``` The intent is the primary signal for agent task routing. Implementations that do not generate useful intents SHOULD omit the field rather than populate it with the file name or path. ### 4.4a `format` The `format` field declares the content format of the file referenced by `path`. It enables agents to make loading decisions before fetching the content — for example, skipping PDF units when only Markdown processing is available, or prioritising OpenAPI specs for API integration tasks. | Value | Description | |-------|-------------| | `markdown` | Markdown document (default if omitted for `.md` files) | | `pdf` | PDF document | | `openapi` | OpenAPI / Swagger specification | | `json-schema` | JSON Schema document | | `jupyter` | Jupyter notebook (.ipynb) | | `html` | HTML document | | `asciidoc` | AsciiDoc document | | `rst` | reStructuredText | | `vtt` | WebVTT subtitle/transcript | | `yaml` | Generic YAML (not OpenAPI or KCP) | | `json` | Generic JSON | | `csv` | Tabular data | | `text` | Plain text | If `format` is omitted, parsers MAY infer the format from the file extension but MUST NOT treat inference failures as errors. Unknown `format` values MUST be silently ignored by parsers. ```yaml format: openapi ``` ### 4.4b `content_type` The `content_type` field provides a full MIME type for cases where `format` alone is insufficient. When both `format` and `content_type` are present, `content_type` takes precedence for format identification. ```yaml content_type: "application/vnd.oai.openapi+yaml;version=3.1" ``` `content_type` is OPTIONAL. If present, it MUST be a valid MIME type string. ### 4.4c `language` The `language` field declares the human language of the unit's content using a BCP 47 language tag ([RFC 5646](https://www.rfc-editor.org/rfc/rfc5646)). ```yaml language: en ``` A root-level `language` field MAY be declared as a default for all units in the manifest. Unit-level `language` values override the root default. Common values: `en`, `en-GB`, `en-US`, `no`, `nb`, `nn`, `de`, `fr`, `es`, `ja`, `zh`. If `language` is omitted at both root and unit level, the language is undeclared. Agents SHOULD treat this as unknown rather than assuming a default language. Unknown `language` values MUST be silently ignored by parsers. ### 4.5 `scope` | Value | Meaning | |-------|---------| | `global` | Relevant to the entire project or system | | `project` | Relevant to a specific project, service, or repository within a larger system | | `module` | Relevant to a specific module, component, or subsystem | ### 4.6 `audience` The `audience` field is a list of one or more values indicating who this unit is intended for. Recognised values: | Value | Intended reader | |-------|----------------| | `human` | Human readers (documentation, guides) | | `agent` | AI agents (machine-navigable context) | | `developer` | Software developers | | `architect` | System architects | | `operator` | Operations / DevOps / SRE | | `devops` | Equivalent to `operator` | A unit MAY have multiple audience values. A unit intended for both humans and agents would declare `audience: [human, agent]`. Unknown audience values MUST be silently ignored by parsers. ### 4.6a `license` The `license` field declares what an agent is permitted to do with a knowledge unit's content after loading it. **Shorthand form** — an SPDX license identifier string: ```yaml license: "Apache-2.0" ``` **Structured form** — an object with explicit fields: ```yaml license: spdx: "CC-BY-4.0" url: "https://creativecommons.org/licenses/by/4.0/" attribution_required: true ``` | Subfield | Type | Description | |----------|------|-------------| | `spdx` | string | An [SPDX license identifier](https://spdx.org/licenses/). Use `LicenseRef-Proprietary` for custom or proprietary licenses. | | `url` | string | URL to the full license text. | | `attribution_required` | boolean | Whether the agent must cite the source when reproducing content. Default: `false`. | A root-level `license` field MAY be declared as a default for all units in the manifest. Unit-level `license` values override the root default. If `license` is omitted, no machine-readable usage terms are declared. Agents SHOULD treat this as unknown and apply conservative defaults. Unknown subfields within `license` MUST be silently ignored. Parsers MUST NOT reject a manifest for an unrecognised `spdx` value. ### 4.6b `update_frequency` The `update_frequency` field is an advisory hint declaring how often the content of a unit typically changes. It helps agents decide how long to cache a unit's content and when to re-fetch. | Value | Meaning | |-------|---------| | `hourly` | Changes multiple times per day | | `daily` | Changes roughly once per day | | `weekly` | Changes roughly once per week | | `monthly` | Changes roughly once per month | | `rarely` | Changes less than once per month | | `never` | Content is immutable (e.g. versioned release notes) | ```yaml update_frequency: weekly ``` `update_frequency` is OPTIONAL. Omitting it means no caching guidance is declared. Agents apply their own defaults. `update_frequency` complements the `validated` field: `validated` answers "when did a human last confirm this was accurate?", while `update_frequency` answers "how often does this content typically change?" Unknown `update_frequency` values MUST be silently ignored by parsers. ### 4.6c `indexing` The `indexing` field declares whether AI agents and crawlers may index, cache, train on, or reproduce the content of a knowledge unit. **Shorthand form** — a string keyword: | Value | Meaning | |-------|---------| | `open` | All operations permitted: read, index, reproduce, train | | `read-only` | Read permitted; no indexing, training, or reproduction | | `no-train` | All operations except model training are permitted | | `none` | No AI access whatsoever | ```yaml indexing: no-train ``` **Structured form** — an object with explicit allow/deny lists: ```yaml indexing: allow: [read, index, reproduce-in-response] deny: [train] attribution_required: true ``` | Permission | Meaning | |------------|---------| | `read` | Agent may load and use content for reasoning | | `index` | Agent/crawler may add to a searchable index | | `reproduce-in-response` | Agent may quote or reproduce in its output | | `train` | Content may be used for model training | | `cache-permanently` | Content may be cached beyond the current session | | `share-externally` | Content may be sent to external systems/APIs | | `summarise` | Agent may generate and share summaries | A root-level `indexing` field MAY be declared as a default for all units in the manifest. Unit-level `indexing` values override the root default. If `indexing` is omitted, no machine-readable crawling permissions are declared. Agents SHOULD apply conservative defaults. Unknown permission values MUST be silently ignored. ### 4.7 `depends_on` The `depends_on` field declares units that SHOULD be loaded and understood before this unit. This allows agents to respect prerequisite ordering when building context. ```yaml depends_on: [project-overview, authentication-concepts] ``` Values MUST be `id` strings of units declared in the same manifest. References to unknown IDs SHOULD produce a validation warning and MUST be silently ignored at runtime. **Circular dependencies:** Parsers MUST detect cycles in `depends_on` relationships and MUST silently ignore the edge that would close the cycle. A graph with cycles does not indicate invalid knowledge — the cycle may be semantically meaningful — but parsers cannot traverse it without cycle detection. No error or warning is required. ### 4.8 `supersedes` Declares that this unit replaces a previous unit. The referenced unit-id MAY be a unit that no longer exists in the manifest (representing a deleted unit). Agents SHOULD prefer this unit over the one it supersedes. ```yaml supersedes: api-authentication-v1 ``` ### 4.9 `triggers` Keywords or short phrases that indicate when this unit is relevant. Used by agents and tools for task-based retrieval. ```yaml triggers: [oauth2, authentication, bearer-token, jwt] ``` **Constraints:** - Each trigger MUST NOT exceed 60 characters. - A unit MUST NOT declare more than 20 triggers. - Trigger values are case-insensitive for matching purposes. - Parsers SHOULD truncate triggers exceeding 60 characters rather than rejecting the manifest. - Parsers SHOULD silently ignore triggers beyond the 20th. ### 4.10 `hints` The `hints` block contains advisory metadata about the size, cost, and loading characteristics of a knowledge unit. All fields are optional. Agents that do not support `hints` MUST silently ignore the block — it carries no normative weight and does not change whether a manifest is valid. ```yaml hints: token_estimate: 42000 token_estimate_method: measured size_bytes: 168000 load_strategy: lazy priority: supplementary density: dense summary_available: true summary_unit: spec-summary partial_load_supported: false ``` #### Hint fields | Field | Type | Description | |-------|------|-------------| | `token_estimate` | integer | Approximate token count for a typical LLM tokenizer. Advisory — actual count varies by model. | | `token_estimate_method` | enum | How the estimate was produced: `measured` (actual tokenizer run) or `estimated` (rough heuristic). Default: `estimated`. | | `size_bytes` | integer | Raw file size in bytes. Model-agnostic; useful for non-text assets (PDFs, images, audio). | | `load_strategy` | enum | When to load this unit relative to manifest processing. See [Load strategy](#load-strategy) below. Default: `lazy`. | | `priority` | enum | Importance when the agent must evict units due to context budget pressure. See [Priority](#priority) below. Default: `supplementary`. | | `density` | enum | Information-to-token ratio of the content. See [Density](#density) below. Default: `standard`. | | `summary_available` | boolean | A shorter summary of this unit exists in this manifest. When `true`, `summary_unit` SHOULD be declared. | | `summary_unit` | string | Unit `id` of the shorter summary. Required when `summary_available: true`. | | `summary_of` | string | Unit `id` that this unit summarises. Declared on the summary side of the relationship. | | `partial_load_supported` | boolean | The content source supports partial or range requests (e.g. HTTP Range, PDF page ranges). Default: `false`. | | `chunked` | boolean | This unit has been split into sequential chunks loadable independently. `chunk_count` SHOULD be declared alongside. | | `chunk_count` | integer | Total number of chunks. Declared on the parent unit alongside `chunked: true`. | | `chunk_of` | string | Unit `id` of the parent this chunk belongs to. | | `chunk_index` | integer | 1-based position of this chunk within the parent sequence. | | `total_chunks` | integer | Total chunks in the parent. Mirrors `chunk_count` on the parent for local access. | | `chunk_topic` | string | Short phrase describing what this chunk covers. Helps agents select the right chunk without loading all of them. | All hint fields are OPTIONAL. Parsers MUST silently ignore unknown hint fields. This ensures forward compatibility as new hints are introduced in future spec versions. #### Load strategy `load_strategy` advises the agent on when to load a unit relative to manifest initialisation. | Value | Meaning | |-------|---------| | `eager` | Load immediately when the manifest is processed. This unit is nearly always needed. | | `lazy` | Load on demand when the agent determines the unit is relevant to the current task. Default. | | `never` | Do not load proactively. Load only if explicitly requested by id. Appropriate for large archives, raw data files, or units where the agent should read a summary instead. | #### Priority `priority` advises the agent on which units to retain when context must be cleared or truncated. | Value | Meaning | |-------|---------| | `critical` | Evict last. This unit contains essential facts the agent must retain to function correctly. | | `supplementary` | Standard priority. Load and use normally; may be evicted if budget is tight. Default. | | `reference` | Evict first. Reference material (API specs, full changelogs, raw data) used for spot lookups rather than sustained reasoning. | An agent managing its own context window SHOULD evict `reference` units before `supplementary` units, and `supplementary` before `critical`. #### Density `density` describes the information-to-token ratio of the content, helping agents decide whether to compress or summarise before placing in context. | Value | Meaning | |-------|---------| | `dense` | Nearly every sentence is load-bearing. Compression risks information loss. Full text is preferred. | | `standard` | Normal prose. Some compression acceptable. Default. | | `verbose` | High token count relative to information content (tutorials, narrative explanations). Summarisation before loading is likely worthwhile. | #### Summary relationships When a short summary of a large unit exists in the same manifest, both sides of the relationship SHOULD be declared: ```yaml units: - id: full-specification path: SPEC.md intent: "What are the normative rules for a knowledge.yaml manifest?" hints: token_estimate: 42000 load_strategy: lazy summary_available: true summary_unit: spec-summary # → points to the summary - id: spec-summary path: SPEC-tldr.md intent: "What are the key points of the spec in 500 words?" hints: token_estimate: 600 load_strategy: eager priority: critical summary_of: full-specification # ← points back to the full unit ``` `summary_unit` on the full unit and `summary_of` on the summary unit are complementary. A validator SHOULD warn when `summary_available: true` has no matching `summary_unit`, or when `summary_of` references a unit that does not declare `summary_available: true`. #### Chunked units Large documents may be split into sequential chunks that can be loaded independently. This allows an agent to load only the relevant section of a large document: ```yaml units: - id: api-reference path: api/reference.md intent: "What endpoints does the API expose?" hints: token_estimate: 62000 load_strategy: never chunked: true chunk_count: 5 - id: api-ref-auth path: api/reference-auth.md intent: "What are the authentication endpoints and token schemas?" hints: token_estimate: 9400 load_strategy: lazy chunk_of: api-reference chunk_index: 1 total_chunks: 5 chunk_topic: "Authentication and token management" ``` `chunk_of` MUST reference a unit in the same manifest. A validator SHOULD warn when `chunk_of` references a non-existent unit, or when `chunk_index` exceeds `total_chunks`. #### Root-level `hints` block The root manifest MAY contain a `hints` block with aggregate information about the entire manifest. This allows an agent to assess total context cost before loading any unit. ```yaml hints: total_token_estimate: 840000 unit_count: 94 recommended_entry_point: overview has_summaries: true has_chunks: false ``` | Field | Type | Description | |-------|------|-------------| | `total_token_estimate` | integer | Advisory sum of all unit `token_estimate` values. | | `unit_count` | integer | Total number of units declared in this manifest. | | `recommended_entry_point` | string | Unit `id` the publisher recommends loading first. Typically an overview, index, or getting-started unit. | | `has_summaries` | boolean | At least one unit in this manifest has `summary_available: true`. | | `has_chunks` | boolean | At least one unit in this manifest has `chunked: true`. | `total_token_estimate` SHOULD be recomputed by tooling when unit estimates change. Publishers SHOULD NOT maintain it by hand; a stale value is worse than an absent value. ### 4.11 `access` and `auth_scope` The `access` field is a lightweight advisory signal indicating what kind of credential, if any, is required to fetch the content of this unit. | Value | Meaning | |-------|---------| | `public` | No credential required. Freely accessible. Default when `access` is omitted. | | `authenticated` | Any valid credential for this knowledge source is sufficient. The root-level `auth` block (§3.3) describes how to acquire one. | | `restricted` | A specific scope or role is required. Declare it in `auth_scope`. The root-level `auth` block (§3.3) describes how to authenticate. | The `auth_scope` field is an optional companion to `access: restricted`. It is a free-form string naming the scope, role, or group required to access the unit. Examples: `"ops-team"`, `"read:internal"`, `"cn=data-analysts"`. Parsers treat it as an opaque advisory string — no validation of the value itself is performed. When both `auth_scope` and `auth.methods` (§3.3) are present, the scope value in `auth_scope` SHOULD correspond to a scope or role supported by the declared auth method. For OAuth 2.1, the `auth_scope` value typically matches one of the `scopes` declared in the `auth.methods` entry. This correspondence is advisory; parsers MUST NOT validate it. ```yaml units: - id: public-overview path: README.md intent: "What is this project?" scope: global audience: [human, agent] # access omitted = public - id: internal-runbook path: ops/runbook.md intent: "How do I handle a production incident?" scope: project audience: [operator, agent] access: authenticated - id: executive-report path: reports/exec-summary.md intent: "What are the key business metrics this quarter?" scope: module audience: [agent] access: restricted auth_scope: executive-team ``` `access` is an advisory declaration. It does not constitute an access control mechanism — a manifest declaring `access: restricted` does not prevent an agent from loading the content if no enforcement layer exists at the transport or storage level. See §14.1. Unknown `access` values MUST be silently ignored by parsers. `auth_scope` is OPTIONAL. Parsers MUST NOT reject a manifest because `auth_scope` is absent on a unit with `access: restricted`. A validator SHOULD warn when `auth_scope` is present on a unit whose `access` is not `restricted`, as the scope has no effect without the access gate. ### 4.12 `sensitivity` The `sensitivity` field classifies the information content of a unit using standard information security levels. It is an advisory signal to agents and orchestration layers about how carefully this content should be handled. | Value | Meaning | |-------|---------| | `public` | No restrictions. Freely shareable. | | `internal` | For internal use only. Not for external parties. | | `confidential` | Restricted to a need-to-know subset. | | `restricted` | Highest sensitivity. Strict handling required. | These levels align with common information classification frameworks (ISO 27001, many national standards). The gradient `public → internal → confidential → restricted` is intentional and ordered. ```yaml units: - id: security-runbook path: ops/security-runbook.md intent: "How do we respond to a security incident?" scope: project audience: [operator, agent] sensitivity: confidential access: restricted ``` `sensitivity` is omitted by default. When omitted, the sensitivity of the unit is undeclared — agents SHOULD treat this as unknown rather than assuming `public`. Unknown `sensitivity` values MUST be silently ignored by parsers. ### 4.13 `deprecated` The `deprecated` field signals that a unit is still present in the manifest but should not be used for new development. It is equivalent to a deprecation annotation in code. ```yaml units: - id: old-api-guide path: docs/api-v1.md intent: "How do I use the v1 API?" scope: module audience: [developer, agent] deprecated: true ``` When `deprecated: true`: - Agents SHOULD prefer non-deprecated alternatives when they exist. - If a `supersedes` field on another unit references this unit, that unit is the preferred replacement. - A validator SHOULD warn when `deprecated: true` is declared but no other unit declares `supersedes: <this-unit-id>`. `deprecated` is OPTIONAL. Default: `false`. When omitted or `false`, no deprecation is implied. ### 4.14 `payment` The `payment` field declares the monetisation model for this unit. It is an advisory signal that allows agents to assess whether access will incur a cost before attempting to load content. In this version, only the `default_tier` sub-field is defined. Additional sub-fields (payment methods, x402 micropayment details, rate limits) are specified in RFC-0005 and may be promoted in a future version. **Unit-level form:** ```yaml units: - id: premium-research path: reports/market-analysis.md intent: "What does the market analysis show?" scope: module audience: [agent] payment: default_tier: metered ``` **Root-level default (overridable per unit):** ```yaml payment: default_tier: free # applies to all units unless overridden ``` #### `payment` sub-fields | Field | Required | Type | Description | |-------|----------|------|-------------| | `default_tier` | OPTIONAL | string | Monetisation tier. One of: `free`, `metered`, `subscription`. Default: `free`. | | Tier | Meaning | |------|---------| | `free` | No cost to access. Default when `payment` is omitted. | | `metered` | Per-request or per-token billing. Agent should check its budget before loading. | | `subscription` | Access requires an active subscription plan. | `payment` is OPTIONAL at both root and unit level. When omitted entirely, all units are assumed to be `free`. Unit-level `payment` overrides the root-level `payment` default for that unit. Unknown `payment` sub-fields MUST be silently ignored. --- ### 4.15 `rate_limits` The `rate_limits` block declares the maximum request rate an agent should observe when accessing this manifest's knowledge units. It is advisory metadata that allows agents to self-throttle without waiting for a `429 Too Many Requests` response. **Root-level (applies to all units unless overridden):** ```yaml rate_limits: default: requests_per_minute: 60 requests_per_day: 1000 ``` **Unit-level override:** ```yaml units: - id: high-volume-index path: index/full.md rate_limits: default: requests_per_minute: 10 requests_per_day: 100 ``` #### `rate_limits` field reference | Field | Required | Type | Description | |-------|----------|------|-------------| | `rate_limits` | OPTIONAL | object | Rate limit declarations. Root-level applies as default; unit-level overrides root. | | `rate_limits.default` | OPTIONAL | object | Default rate limit applied when no tier-specific limit is declared. | | `rate_limits.default.requests_per_minute` | OPTIONAL | integer | Maximum requests per 60-second rolling window. Omit for no limit. | | `rate_limits.default.requests_per_day` | OPTIONAL | integer | Maximum requests per calendar day (UTC). Omit for no limit. | Both `requests_per_minute` and `requests_per_day` are OPTIONAL. Declaring one without the other is valid. When both are declared, agents MUST respect the more restrictive limit in any given window. `rate_limits` is OPTIONAL at both root and unit level. Omitting it entirely means no rate limit is declared — agents SHOULD NOT infer a specific limit. Unknown sub-fields within `rate_limits` MUST be silently ignored. Additional rate limit sub-fields (per-tier, token-based, header mapping) are specified in RFC-0005 and may be promoted in a future version. --- ### 4.16 `visibility` The `visibility` block on a unit declares when and to whom the unit is accessible. It replaces the flat `sensitivity` + `access` combination for units where the answer depends on the agent's environment or role. ```yaml units: - id: production-db-schema path: docs/schema/production.md intent: "What is the production database schema?" scope: project audience: [developer, operator, agent] visibility: default: confidential conditions: - when: environment: [development, local] then: sensitivity: internal requires_auth: false - when: environment: [production] agent_role: [platform_admin, security_auditor] then: sensitivity: confidential requires_auth: true - when: environment: [production] then: sensitivity: restricted requires_auth: true ``` #### `visibility` field reference | Field | Required | Type | Description | |-------|----------|------|-------------| | `visibility.default` | OPTIONAL | string | Baseline sensitivity when no condition matches. Uses the §4.12 vocabulary: `public` \| `internal` \| `confidential` \| `restricted`. | | `visibility.conditions` | OPTIONAL | list | Ordered list of condition objects. First match wins. | | `visibility.conditions[].when` | REQUIRED | object | Matching criteria. Keys: `environment`, `agent_role`. | | `visibility.conditions[].when.environment` | OPTIONAL | string or list | Match when the agent's environment equals this value or appears in this list. | | `visibility.conditions[].when.agent_role` | OPTIONAL | string or list | Match when the agent's declared role equals this value or appears in this list. | | `visibility.conditions[].then` | REQUIRED | object | Overrides to apply when this condition matches. | | `visibility.conditions[].then.sensitivity` | OPTIONAL | string | Sensitivity override. Same vocabulary as §4.12. | | `visibility.conditions[].then.requires_auth` | OPTIONAL | boolean | Whether authentication is required under this condition. | | `visibility.conditions[].then.authority` | OPTIONAL | object | Authority override within this condition. See §4.17. | #### Evaluation semantics - Conditions are evaluated in declaration order. **First matching condition wins.** - A condition matches when ALL `when` keys match. Within a key, list values use OR semantics. - If no condition matches, `visibility.default` applies. - If `visibility` is absent, the flat `sensitivity` (§4.12) and `access` (§4.11) fields apply. - If an agent cannot determine its environment or role, it MUST treat `visibility.default` as the effective visibility and MUST NOT assume the most permissive condition. #### Precedence ``` visibility.conditions[].then.sensitivity (highest — first match) ↓ visibility.default ↓ unit-level sensitivity (§4.12) ↓ root compliance.sensitivity (§3.5) (lowest) ``` When both `visibility` and `sensitivity` are declared on a unit, `visibility` takes precedence. The flat `sensitivity` is the fallback if `visibility` is absent or no condition matches and `visibility.default` is unset. KCP does not define how an agent determines its environment or role. Common approaches: environment variables (`KCP_ENVIRONMENT`, `NODE_ENV`), agent framework configuration, or OAuth token claims (`kcp_role`). #### Suggested role vocabulary The following role names are RECOMMENDED. Manifest authors MAY use any string. | Role | Meaning | |------|---------| | `developer` | Software development tasks | | `operator` | Infrastructure and operations | | `security_auditor` | Security review and compliance assessment | | `dpo` | Data Protection Officer | | `platform_admin` | Broad administrative access | | `data_analyst` | Data access and analysis | | `ci_cd` | Automated build and deployment pipelines | | `external_agent` | Agents from outside the organisation's trust boundary | #### Suggested environment vocabulary | Value | Meaning | |-------|---------| | `local` | Developer's local machine | | `development` | Shared development environment | | `staging` | Pre-production, production-like configuration | | `production` | Live production environment | | `dr` | Disaster recovery environment | --- ### 4.17 `authority` The `authority` block on a unit declares what actions an agent may take with the unit's content. For each action, the author declares whether the agent acts on its own initiative, must first obtain human approval, or is denied entirely. ```yaml units: - id: gdpr-processing-records path: compliance/gdpr-processing.md intent: "What personal data does the organisation process and under what legal basis?" scope: project audience: [developer, operator, agent, security_auditor] authority: read: initiative # agent reads without asking summarize: initiative # agent may summarize without asking modify: denied # no agent may modify this content share_externally: requires_approval # must ask a human before sharing execute: denied # no executable instructions in this unit ``` #### Action vocabulary | Action | Default | Meaning | |--------|---------|---------| | `read` | `initiative` | Load and reason over this unit's content. | | `summarize` | `initiative` | Produce summaries or extracts of this unit for context. | | `modify` | `denied` | Modify files, records, or code described by or in this unit. | | `share_externally` | `denied` | Include this content in external communications or outputs. | | `execute` | `denied` | Run commands, scripts, or instructions contained in this unit. | #### Permission values | Value | Meaning | |-------|---------| | `initiative` | The agent may take this action without human approval. | | `requires_approval` | The agent SHOULD request human approval before taking this action. The `delegation.human_in_the_loop` block (§3.4) provides the approval mechanism. | | `denied` | The agent MUST NOT take this action. If the task requires the action, the agent SHOULD surface this constraint to its operator. | #### Safe defaults When the `authority` block is absent, these defaults apply: ```yaml authority: read: initiative summarize: initiative modify: denied share_externally: denied execute: denied ``` These defaults are intentionally conservative. Agents may read and summarize freely; all write, share, and execute actions require explicit grant. #### Extensibility Manifest authors MAY declare additional custom actions: ```yaml authority: read: initiative export_to_pdf: requires_approval # custom action submit_to_regulator: denied # domain-specific action ``` Parsers MUST NOT reject manifests for unknown action names. Agents that do not recognise a custom action SHOULD treat it as `denied` (safe default). --- ### 4.18 `discovery` The `discovery` block records how, when, and how confidently a capability was identified. It enables automated tooling (web traversal, OpenAPI introspection, LLM inference) to express the epistemic state of each declared unit. The `discovery` block answers: *"How do we know this unit is real, and how sure are we?"* ```yaml units: - id: submit-expense-report path: capabilities/expense-submit.md intent: "How do I submit an expense report via the HR portal?" scope: project audience: [agent] discovery: verification_status: observed source: web_traversal observed_at: "2026-03-10T14:22:00Z" verified_at: null confidence: 0.72 contradicted_by: null ``` #### `discovery` field reference | Field | Required | Type | Default | Description | |-------|----------|------|---------|-------------| | `discovery.verification_status` | OPTIONAL | enum | `verified` | The current verification state. See vocabulary below. | | `discovery.source` | OPTIONAL | enum | `manual` | How this unit was discovered. See vocabulary below. | | `discovery.observed_at` | OPTIONAL | ISO 8601 datetime | null | When this capability was first observed or inferred. | | `discovery.verified_at` | OPTIONAL | ISO 8601 datetime | null | When this capability was independently confirmed. MUST be null if `verification_status` is `rumored` or `observed`. | | `discovery.confidence` | OPTIONAL | float 0.0–1.0 | 1.0 | Confidence in this capability declaration. | | `discovery.contradicted_by` | OPTIONAL | string | null | The `id` of another unit in this manifest that provides a conflicting description of the same capability. | #### `verification_status` vocabulary | Value | Meaning | |-------|---------| | `rumored` | Capability mentioned in an indirect source (marketing copy, third-party description, LLM inference). Not yet observed directly. | | `observed` | Capability found via direct technical observation (web traversal, API call, screenshot) but not yet cross-confirmed against a canonical source. | | `verified` | Capability confirmed against a canonical source (OpenAPI spec, official documentation, successful live API call). | | `deprecated` | Capability was previously `verified` or `observed` but is no longer present or functional. Retained for audit. | **Normative rules:** - A unit with `verification_status: rumored` MUST declare `confidence < 0.5`. - A unit with `verification_status: verified` SHOULD declare `confidence >= 0.8`. - A unit with `verification_status: deprecated` SHOULD NOT be loaded by agents for live operation. It MAY be loaded by audit tooling. - If `verification_status` is absent, agents MUST treat the unit as `verified` — this preserves the semantics of all existing hand-authored manifests. #### `source` vocabulary | Value | Meaning | |-------|---------| | `manual` | Declared by a human author with direct knowledge of the capability. Default. | | `web_traversal` | Discovered by automated web or UI traversal (Playwright, headless browser). | | `openapi` | Derived from an OpenAPI, AsyncAPI, or equivalent machine-readable API description. | | `llm_inference` | Inferred by an LLM from natural-language content (documentation, marketing pages). | Unknown `source` values MUST be silently ignored. #### `contradicted_by` When two observations of the same capability disagree, both units SHOULD be preserved with `contradicted_by` pointing at each other. The serving layer SHOULD surface a `discovery_conflict` warning alongside both units and MUST NOT silently discard either. ```yaml units: - id: expense-api-v2 intent: "Submit expense report via REST API at /api/v2/expenses" discovery: verification_status: observed confidence: 0.71 contradicted_by: expense-api-v1 - id: expense-api-v1 intent: "Submit expense report via REST API at /api/v1/expenses" discovery: verification_status: observed source: openapi confidence: 0.85 contradicted_by: expense-api-v2 ``` #### Relationship to RFC-0010 (Bi-Temporal Unit Validity) | Field | Question answered | |-------|-------------------| | `temporal.recorded_at` (RFC-0010) | When did the manifest author write this version down? | | `discovery.observed_at` (RFC-0012) | When was the capability first encountered by the discovery tool? | Both blocks may be declared together on the same unit. --- ## 5. Relationships The optional `relationships` section declares explicit directional relationships between units. This is separate from the inline `depends_on` field and supports richer graph semantics. ```yaml relationships: - from: <unit-id> to: <unit-id> type: enables | context | supersedes | contradicts | depends_on | governs ``` | Type | Meaning | Agent navigation implication | |------|---------|------------------------------| | `enables` | `from` enables or unlocks `to` | Load `from` first when user is not yet ready for `to` | | `context` | `from` provides useful background for `to` (advisory) | Load `from` for deeper understanding; `to` works alone | | `supersedes` | `from` replaces `to` | Prefer `from`; skip `to` unless historical content requested | | `contradicts` | `from` conflicts with `to` | Surface both as known conflict; do not treat as simultaneously authoritative | | `depends_on` | `from` depends on `to` | Load `to` before `from`; `from` may be incomplete without `to` | | `governs` | `from` declares authoritative policy/standards that `to` must comply with | When compliance/standards questions arise about `to`, load `from` as authoritative source; `from` takes precedence on conflict | The relationship type vocabulary is shared between `relationships` (§5) and `external_relationships` (§3.6). All types are valid in both sections. Both `from` and `to` MUST be `id` values of units declared in the same manifest. Relationships referencing unknown IDs SHOULD produce a validation warning and MUST be silently ignored. Unknown relationship types MUST be silently ignored. --- ## 6. Versioning ### 6.1 Spec Version (`kcp_version`) `kcp_version` identifies which version of this specification the manifest conforms to. Current valid value: `"0.12"`. The values `"0.1"` through `"0.11"` refer to prior drafts (January–March 2026); parsers SHOULD treat these manifests as conformant with this version, as v0.12 is a strict superset of v0.11 (new fields only, no removals or breaking changes). Parsers encountering an unknown `kcp_version` SHOULD process the manifest using the closest known version and SHOULD emit a warning. ### 6.2 Manifest Version (`version`) The `version` field is the manifest author's own version of the knowledge index, independent of the spec version. It follows [Semantic Versioning 2.0.0](https://semver.org/). Authors SHOULD increment this value when units are added, removed, or materially changed. --- ## 7. Validation A conformant parser MUST accept any manifest that satisfies the REQUIRED fields in §3 and §4. The following conditions SHOULD produce warnings but MUST NOT cause the parser to reject the manifest: - A `path` value that does not resolve to an existing file - A `depends_on` or `relationships` reference to an unknown `id` - A `triggers` entry exceeding 60 characters (truncate and warn) - More than 20 `triggers` entries on a single unit (ignore excess and warn) - An unknown `audience` value - An unknown `kind` value - An unknown `format` value - An unknown `update_frequency` value - An unknown relationship `type` - A `kcp_version` the parser does not recognise - Duplicate `id` values (parsers SHOULD use the first occurrence) - `auth_scope` present on a unit whose `access` is not `restricted` - `auth.methods` containing no recognised `type` values - `discovery.verification_status: rumored` with `confidence >= 0.5` (violates normative rule §4.18) - `discovery.verification_status: verified` with `confidence < 0.8` (advisory) - `discovery.verified_at` set when `verification_status` is `rumored` or `observed` - `discovery.contradicted_by` referencing an unknown unit `id` - A `visibility.conditions[]` entry missing a `when` or `then` key - An `authority` action value not in `{initiative, requires_approval, denied}` (unknown values MUST be silently ignored at runtime but SHOULD warn at validation time) The following conditions MUST cause the parser to reject the manifest: - The file is not valid YAML - The file is not UTF-8 encoded - The `project` field is absent or empty - The `units` field is absent or empty - A unit is missing its `id`, `path`, `intent`, `scope`, or `audience` field --- ### 7.1 JSON Schema A JSON Schema (draft-07) for `knowledge.yaml` is available at [`schema/knowledge-schema.json`](./schema/knowledge-schema.json). It covers all fields defined in this specification: root fields (`kcp_version`, `project`, `version`, `updated`, `language`, `license`, `indexing`, `hints`, `trust`, `auth`, `delegation`, `compliance`, `payment`, `rate_limits`, `manifests`, `external_relationships`), unit fields (`id`, `path`, `kind`, `intent`, `format`, `content_type`, `language`, `scope`, `audience`, `license`, `validated`, `update_frequency`, `indexing`, `depends_on`, `supersedes`, `triggers`, `hints`, `access`, `auth_scope`, `sensitivity`, `deprecated`, `delegation`, `compliance`, `payment`, `rate_limits`, `external_depends_on`), and relationship fields (`from`, `to`, `type`). The schema enforces required fields, value constraints (e.g. `id` pattern, `kind` enum, `format` enum, trigger `maxLength` and `maxItems`), and structural rules. It can be used with any JSON Schema validator to check manifest correctness before parsing, and by editors for autocompletion and inline validation. --- ## 8. Conformance Levels Implementations are encouraged to adopt KCP incrementally. Three levels are defined: **Level 1 — Minimal** The manifest contains `project`, `units`, and for each unit: `id`, `path`, `intent`, `scope`, and `audience`. A Level 1 manifest answers the question: "what knowledge exists, what does each piece answer, and who is it for?" Parsers SHOULD supply default values when `scope` or `audience` are absent (`scope` defaults to `global`; `audience` defaults to an empty list). **Level 2 — Structured** Extends Level 1 with `validated`, `depends_on`, `kind`, `format`, `language`, the core `hints` fields (`token_estimate`, `load_strategy`, `summary_available`, `summary_unit`, `summary_of`), the access and classification fields (`access`, `auth_scope`, `sensitivity`, `deprecated`, `payment`), the root-level `trust.provenance` block, the `authority` block (§4.17) on units and at root level (manifest-wide action permission defaults), and the `discovery` block (§4.18) on units with `verification_status`, `source`, `confidence`, and `contradicted_by`. A Level 2 manifest supports freshness-aware retrieval, dependency-ordered loading, artifact type classification, content format awareness, multilingual navigation, basic context-budget planning, access-tier routing, action governance (agents know which operations require human approval or are denied), and discovery provenance (consuming agents can filter by confidence and verification state). **Level 3 — Full** Extends Level 2 with `triggers`, `supersedes`, `license`, `update_frequency`, `indexing`, advanced `hints` (`priority`, `density`, chunking fields), root-level `hints`, a `relationships` section, the root-level `auth` block (§3.3) with authentication method descriptions, the `trust.audit` sub-block (§3.2) with access logging and trace context requirements, the `manifests` block (§3.6) with federation declarations, `external_depends_on` (unit-level cross-manifest dependencies), `external_relationships` (root-level cross-manifest relationship declarations), `local_mirror` for air-gapped federation, and the `visibility` block (§4.16) with environment and role-based conditional access. A Level 3 manifest supports task-based routing, knowledge graph navigation, drift detection, usage rights declaration, cache management, AI crawling permissions, context eviction ordering, large-document chunked access, authentication discovery, auditable knowledge access, federated multi-manifest knowledge graphs, and environment-aware conditional access control. All three levels are valid KCP. A tool MUST NOT reject a manifest for being below the level it was designed for — graceful degradation is required. --- ## 9. Relationship to llms.txt KCP and llms.txt address adjacent problems and are designed to coexist. `llms.txt` answers the question: "what does this site contain?" It is a flat, human-readable index optimised for simple cases. KCP answers the question: "how is this knowledge structured, how fresh is it, and how should an agent navigate it?" It is not a replacement for `llms.txt`. A project MAY maintain both files. When both are present, a KCP-aware agent SHOULD prefer `knowledge.yaml` for navigation and treat `llms.txt` as a fallback for tools that do not support KCP. --- ## 10. Relationship to MCP The Model Context Protocol (MCP) defines how AI agents connect to tools and data sources. KCP defines how the knowledge those tools serve is structured. The two protocols are complementary: - MCP provides the transport and tool invocation layer - KCP provides the knowledge structure and metadata layer A knowledge server MAY expose KCP-structured content via MCP. The knowledge units declared in `knowledge.yaml` correspond naturally to the resources a KCP-aware MCP server would expose. --- ## 11. Relationship to HATEOAS HATEOAS (Hypermedia As The Engine Of Application State) is a REST architectural constraint in which server responses include typed links describing the transitions available to the client from its current state. KCP shares the same foundational insight — that typed, directional relationships between resources are necessary for navigation — but applies it in a different domain with a different execution model. **Shared insight:** Both reject the "flat list of resources" model. A consumer navigating an information space needs to know not just what exists, but how resources relate, what transitions are valid, and what each resource is for. KCP's `depends_on`, `supersedes`, and `relationships` fields are the same idea as HATEOAS link relations: typed directed edges that tell the consumer how to move through the space. **Key difference — static vs dynamic:** HATEOAS links are generated per-response, reflecting the current state of the resource. A HATEOAS server may offer a `cancel` link on a pending order and omit it on a fulfilled one. KCP is a committed file. The manifest declares topology at authoring time; it does not adapt to the agent's current task or the state of the knowledge base at query time. This is a deliberate design choice that enables KCP to work without a server, but it means KCP cannot express conditional navigation ("this unit is relevant only if you have already loaded unit X"). **Where KCP goes beyond HATEOAS:** HATEOAS link relations express what action a client can take next. KCP's `intent` field expresses what question a unit answers — a different kind of metadata that enables semantic routing without loading content. KCP's `validated` field distinguishes human confirmation of accuracy from file modification time, which has no equivalent in HTTP caching semantics. KCP's `audience` field and `triggers` list address relevance filtering concerns that REST never needed because its consumers are not context-window-constrained agents navigating heterogeneous corpora. **Where HATEOAS goes beyond KCP:** Runtime, state-dependent navigation. A HATEOAS server adjusts its link set based on live resource state. KCP cannot model this. If future implementations add a KCP-aware query server (such as a KCP-over-MCP bridge), this gap narrows for the dynamic case. For the static file format, it is a fundamental constraint. Implementations that are familiar with HATEOAS may find the `relationships` section the most natural entry point into KCP. The conceptual vocabulary — typed links, link relations, navigation graph — transfers directly. The intent, freshness, and audience fields represent concerns that arise specifically when the consumer is an AI agent rather than an API client. --- ## 12. Relationship to A2A Google's [Agent-to-Agent (A2A) protocol](https://google.github.io/A2A/) defines how AI agents discover and invoke each other. An A2A Agent Card, published at `/.well-known/agent.json`, describes an agent's identity, skills, capabilities, and how to authenticate when calling it. KCP and A2A address adjacent but distinct concerns and are designed to compose: - **A2A** answers: *"Who is this agent, what can it do, and how do I call it?"* - **KCP** answers: *"What knowledge does this agent have access to, and what does each piece require to read?"* Neither protocol can express what the other expresses. ### The two-layer model | Concern | A2A Agent Card | KCP Manifest | |---------|---------------|--------------| | **Describes** | The agent (service) | The knowledge (content units) | | **Published at** | `/.well-known/agent.json` | `knowledge.yaml` (project root) | | **Format** | JSON | YAML | | **Auth granularity** | Per-agent (transport layer) | Per-knowledge-unit (access layer) | | **Sensitivity labels** | Not in scope | `public`, `internal`, `confidential`, `restricted` | | **Delegation controls** | Not in scope | `max_depth`, capability attenuation, audit chain (§3.4) | | **Human-in-the-loop** | Not in scope | Per-unit, with approval mechanism (§3.4) | | **Compliance metadata** | Not in scope | `regulations`, `data_residency`, `restrictions` (§3.5) | | **Discovery** | Agent skills, I/O modes, capabilities | Knowledge units, intents, triggers, freshness | ### Authentication — two layers, not competing Both protocols reference OAuth2. This is where a superficial reading might suggest redundancy. It is not. **A2A auth** operates at the transport layer. It defines how a calling agent authenticates *to* the target agent — the question is: "Are you allowed to talk to this agent at all?" **KCP auth** (§3.3) operates at the knowledge-access layer. It defines what credentials are needed to access a *specific knowledge unit* within the agent. The question is: "Now that you are connected, are you allowed to read this particular file?" An agent that holds a valid A2A transport token may still be denied access to a specific KCP unit because it lacks the required `auth_scope`, has exceeded the unit's `max_depth` delegation limit, or because the unit requires human-in-the-loop approval before an agent may read it. ### How they compose An A2A Agent Card MAY reference a KCP manifest using the `knowledgeManifest` field (a proposed convention — not part of the A2A specification): ```json { "name": "Research Agent", "url": "https://research.example.com/agent", "knowledgeManifest": "/.well-known/kcp.json" } ``` This enables a calling agent to read both documents before deciding how to interact: 1. **A2A Card** — discover endpoint, authenticate (transport), identify skills 2. **KCP manifest** — inspect knowledge units, evaluate per-unit access requirements, determine which units to load and which require additional credentials or human approval A2A without KCP gives you a well-described front door with no access control inside. KCP without A2A gives you fine-grained knowledge access control with no standard way for agents to find or call the agent hosting it. Together, they form a complete two-layer stack. ### Working example [`examples/a2a-agent-card/`](./examples/a2a-agent-card/) contains a complete worked example: an A2A Agent Card and KCP manifest for the same clinical research agent, with a runnable Java simulator (`examples/a2a-agent-card/simulator/`) that executes all four composition phases and verifies each access decision with 30 tests. --- ## 13. Extension Fields Implementations MAY add custom fields to the root manifest or to individual units. Custom fields SHOULD use a namespaced prefix to avoid collisions with future spec fields (e.g. `x-myorg-priority: high`). Parsers MUST silently ignore fields they do not recognise, including extension fields from other implementations. This is required for forward compatibility. --- ## 14. Security Considerations **Path traversal:** Parsers MUST NOT resolve `path` values that traverse outside the manifest's root directory (i.e. paths containing `..` that escape the root). Such paths SHOULD be rejected with an error. **Denial of service:** Parsers operating in untrusted environments SHOULD impose limits on manifest size, unit count, and string field lengths to guard against resource exhaustion. **Trust:** A `knowledge.yaml` is as trustworthy as its source. Agents consuming KCP manifests from untrusted sources SHOULD treat the content as untrusted input. ### 14.1 Trust Model KCP is a declarative format. A manifest describes properties of knowledge units — their intended audience, freshness, access requirements, compliance scope, and dependencies. It does not enforce any of these properties. Enforcement is the responsibility of the systems and agents that consume the manifest. **All KCP metadata is advisory.** The presence of a field in a manifest is a declaration by the publisher, not a guarantee of correctness or a technical control. Agents and tools MUST NOT treat KCP metadata as a substitute for independent verification where verification is required. The following specific cases apply: - **Freshness (`validated`):** A `validated` date declares when a human last confirmed the content. It does not prove the content is accurate at the time of consumption. Agents that require freshness guarantees MUST verify content independently. - **Compliance scope (e.g. `regulations`, `data_residency`):** A manifest declaring compliance with a regulation (e.g. GDPR, HIPAA) asserts the publisher's stated intent, not verified compliance status. Agents and operators MUST NOT rely on compliance metadata as a legal basis for processing decisions. Compliance verification remains the responsibility of the data controller. - **Access requirements (e.g. `access`, `auth`):** Auth metadata declares the publisher's intended access controls. It does not constitute an access control mechanism. A manifest declaring `access: restricted` does not prevent an agent from loading the referenced content if no enforcement layer exists at the transport or storage level. - **Processing restrictions (e.g. `no-external-llm`, `no-training`):** Restrictions declared in a manifest are signals to the agent's orchestration layer. They MUST be evaluated before content is loaded into an agent context. An orchestration layer that loads content and then checks restrictions has already violated them. - **Publisher identity:** Free-text publisher fields (e.g. `publisher: "Example Corp"`) carry no trust value. Only cryptographically verified identifiers (e.g. a DID resolved and verified at consumption time) provide publisher identity assurance. **Agents SHOULD surface trust limitations to operators** when acting on metadata that has security or legal implications. An agent that silently treats advisory compliance metadata as enforced is creating hidden liability for its operator. ### 14.2 YAML Safety Parsers MUST use a safe YAML constructor that disables arbitrary type instantiation. YAML documents containing type tags that instantiate non-primitive types (e.g. `!!javax.script.ScriptEngineManager` in Java) MUST be rejected. Parsers MUST NOT use YAML loaders that execute code embedded in the document. This requirement applies to all YAML content, including content fetched from remote sources. ### 14.3 Remote Content Parsers and agents that fetch remote manifests (e.g. via federation or external references) MUST apply the following constraints: - Remote manifest URLs MUST use HTTPS. Plain HTTP URLs MUST be rejected. - Parsers MUST NOT resolve URLs that target private address ranges (RFC 1918: 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16), link-local addresses (169.254.0.0/16), or loopback addresses (127.0.0.0/8, ::1). This check MUST be performed after DNS resolution to guard against DNS rebinding attacks. - Parsers MUST detect and halt on cycles in remote manifest references. A visited-set of resolved manifest URLs MUST be maintained across the fetch chain. A manifest URL that appears in its own transitive fetch chain MUST be silently ignored. - Parsers SHOULD enforce a maximum remote fetch depth. The RECOMMENDED default is 5. - The YAML safety requirements of §14.2 apply to all remotely fetched manifests. **Federation-specific constraints (§3.6):** - Parsers MUST enforce a maximum of 50 unique manifests per federation resolution session (RECOMMENDED default). Exceeding this limit SHOULD produce a warning. - Remote manifests larger than 1 MB SHOULD be rejected with a warning. - Remote manifests containing more than 10,000 units SHOULD be rejected with a warning. - Fetch timeout: RECOMMENDED default of 10 seconds per remote manifest. - Trust does NOT propagate transitively across federation boundaries. A manifest is authoritative only over the sub-manifests it directly declares. Trust escalation through transitive chains MUST NOT be assumed. --- ## 15. Query Vocabulary ### 15.1 Purpose KCP manifests are designed for agents to navigate knowledge without loading everything at once. But navigation without a query standard requires each consumer to implement its own filtering logic — and requires agents to parse entire manifests before they know which units are relevant. This section defines a normative query vocabulary that lets tools, orchestrators, and agents ask: *"Which units match my current task, fit my context budget, and require capabilities I have?"* — and receive scored, budget-aware results without loading any unit content. The query vocabulary is a **consumer-side convention**, not a manifest-side addition. No changes to `knowledge.yaml` are required. All fields in this section are evaluated against existing manifest fields defined in §3 and §4. --- ### 15.2 Query Request Shape A query is a structured object. All fields are OPTIONAL — an empty query matches all units. ```yaml # Full query example terms: ["authentication", "oauth2"] audience: agent scope: module sensitivity_max: internal max_token_budget: 8000 include_summaries: true exclude_deprecated: true has_capabilities: [tool:kubectl, permission:deploy-prod] exclude_stale: true federation_scope: declared ``` #### Request field reference | Field | Type | Default | Description | |-------|------|---------|-------------| | `terms` | `list[string]` | `[]` | Free-text search terms. Matched against `triggers`, `intent`, `id`, and `path`. | | `audience` | `string` | absent | If set, only return units whose `audience` list includes this value. | | `scope` | `string` | absent | If set, only return units whose `scope` equals this value. | | `sensitivity_max` | `string` | absent | Maximum sensitivity ceiling. Ordering: `public` < `internal` < `confidential` < `restricted`. Units above the ceiling are excluded. | | `max_token_budget` | `integer` | absent | Maximum total `hints.token_estimate` across all returned results. | | `include_summaries` | `boolean` | `true` | When `true` and `max_token_budget` is set, substitute `summary_unit` for units that exceed the remaining budget. | | `exclude_deprecated` | `boolean` | `true` | When `true`, units with `deprecated: true` are excluded. | | `has_capabilities` | `list[string]` | absent | Agent-declared capability set. Units whose `requires_capabilities` contains values not in this list are excluded. See §15.5. | | `exclude_stale` | `boolean` | `false` | When `true`, exclude units that compute as stale per their `freshness_policy`. See §15.6. | | `federation_scope` | `string` | `local` | Controls cross-manifest query range. See §15.7. | --- ### 15.3 Query Response Shape ```yaml results: - unit_id: auth-guide score: 13 path: docs/api/authentication.md token_estimate: 4200 summary_unit: auth-guide-tldr match_reason: [trigger, intent] source_manifest: null - unit_id: sso-integration-guide score: 8 path: docs/sso.md token_estimate: 3100 match_reason: [intent] source_manifest: identity-service ``` #### Response field reference | Field | Type | Description | |-------|------|-------------| | `unit_id` | `string` | The `id` of the matching unit. | | `score` | `integer` | Relevance score. Higher is more relevant. | | `path` | `string` | The unit's `path` field (convenience — avoids a second lookup). | | `token_estimate` | `integer` or `null` | The unit's `hints.token_estimate`, if declared. | | `summary_unit` | `string` or `null` | The unit's `hints.summary_unit`, if declared. | | `match_reason` | `list[string]` | Scoring rules that contributed. Values: `trigger`, `intent`, `id`, `path`. | | `source_manifest` | `string` or `null` | `null` for units in the local manifest. When `federation_scope: declared`, the `manifests[].id` value of the sub-manifest the unit came from. Agents MUST resolve the unit path relative to the sub-manifest's base URL when `source_manifest` is non-null. | --- ### 15.4 Scoring Algorithm Implementations SHOULD score units using the following rules. Alternative scoring algorithms are permitted but the default SHOULD produce equivalent ordering for identical inputs. | Rule | Points | Condition | |------|--------|-----------| | Trigger match | 5 | Per search term that matches any entry in `triggers` (§4.9), case-insensitive substring. | | Intent match | 3 | Per search term that appears as a substring in `intent` (§4.5), case-insensitive. | | ID/path match | 1 | Per search term that appears as a substring in `id` or `path`, case-insensitive. | **Sorting:** Results are sorted descending by `score`. Ties are broken by declaration order in the manifest (earlier units first). **Top-N:** Implementations SHOULD return the top 5 results by default. The limit MAY be configurable by the caller. --- ### 15.5 Budget-Constrained Selection When `max_token_budget` is set, the scorer MUST apply the following algorithm after scoring: 1. Sort candidates by score (descending). 2. Walk the sorted list. For each candidate unit: a. If `hints.token_estimate` fits in the remaining budget, include it and decrement the budget. b. If `include_summaries: true` and the unit has `hints.summary_unit`, check whether the summary unit's `token_estimate` fits. If so, include the summary unit (not the original) and decrement the budget accordingly. c. Otherwise, skip the unit. 3. Stop when the budget is exhausted or all candidates have been evaluated. Units without `hints.token_estimate` are treated as having an estimate of 0 — they are always included, since their cost is unknown and presumed small. --- ### 15.6 Capability Filter (`has_capabilities`) When `has_capabilities` is set, units whose `requires_capabilities` (§4) contains values not present in the `has_capabilities` list are excluded from results. **Prefix normalization:** Implementations SHOULD treat bare values and `tool:`-prefixed values as equivalent (e.g. `kubectl` and `tool:kubectl` are the same). Unknown prefixes are treated as opaque strings and matched literally. **Absent `requires_capabilities`:** Units with no `requires_capabilities` declaration are always included regardless of the `has_capabilities` filter. --- ### 15.7 Freshness Filter (`exclude_stale`) When `exclude_stale: true`, units that compute as stale are excluded from results. A unit is **stale** when all of the following hold: - `validated` is declared on the unit. - `freshness_policy.max_age_days` is declared on the unit or root. - `(today − validated_date) > max_age_days` (using semver date comparison). Units without `validated` or without `freshness_policy.max_age_days` are **not** excluded — absence of a freshness declaration is not the same as stale. --- ### 15.8 Federation Scope The `federation_scope` field controls whether the query is evaluated against the local manifest only or also against federated sub-manifests declared in `manifests[]` (§3.6). | Value | Meaning | |-------|---------| | `local` | Query the local manifest only. **Default.** | | `declared` | Query the local manifest and all manifests listed in its `manifests[]` block (one hop). Results include `source_manifest` to identify origin. | `recursive` scope (transitively following sub-manifests across multiple hops) is deferred. Performance and SSRF amplification characteristics require empirical validation against real federation graphs before standardising. `declared` covers the vast majority of hub+leaf topologies. **Normative rules for `declared` scope:** - Implementations MUST fetch all `manifests[]` entries before scoring. - Implementations MUST apply all active filters (`audience`, `sensitivity_max`, `has_capabilities`, `exclude_stale`, etc.) to sub-manifest units identically to local units. - Implementations MUST set `source_manifest` to the `manifests[].id` value of the origin sub-manifest on each result that comes from a sub-manifest. - Implementations MUST apply the federation security constraints of §14.3 to all fetches triggered by `federation_scope: declared`. - Implementations that do not support `federation_scope: declared` MUST treat it as `local` and MUST NOT return an error. --- ### 15.9 Security Considerations for Queries - **`sensitivity_max` is advisory**, not access control. An agent MUST NOT treat query results as a security boundary. The filter is for navigation efficiency, not authorization. See §14.1. - **`federation_scope: declared`** triggers outbound HTTPS fetches. All constraints of §14.3 (SSRF prevention, private address blocking, cycle detection, depth limits) apply. - **`source_manifest`** in responses discloses which sub-manifests were queried. This is equivalent information to what is already declared in `manifests[]` and does not constitute additional information disclosure. - **`has_capabilities`** reveals the querying agent's capability set to the query processor. Implementations that proxy queries to third-party servers SHOULD document this disclosure. ### 15.10 Worked Examples #### 15.10.1 End-to-end query and response (budget + staleness + audience) An HR-bot agent queries for expense-reporting guidance for EU employees. Token budget: 4000. Stale units are excluded. The third result exceeds the remaining budget, so its summary unit is substituted. ```yaml # Query request terms: ["expense report", "reimbursement"] audience: agent sensitivity_max: internal max_token_budget: 4000 include_summaries: true exclude_deprecated: true exclude_stale: true # Query response results: - unit_id: submit-expense-report score: 10 # trigger("expense report")=5 + trigger("reimbursement")=5 path: capabilities/expense-submit.md token_estimate: 1800 summary_unit: null match_reason: [trigger] source_manifest: null - unit_id: expense-policy-eu score: 5 # trigger("expense report")=5 path: policies/expense-eu.md token_estimate: 1200 summary_unit: null match_reason: [trigger] source_manifest: null - unit_id: expense-approval-workflow-tldr # summary substituted — original (6000 tokens) score: 3 # intent("expense")=3 exceeded remaining budget (1000) path: workflows/approval-tldr.md token_estimate: 600 summary_unit: null match_reason: [intent] source_manifest: null # Budget consumed: 1800 + 1200 + 600 = 3600 of 4000. Remaining candidates skipped. ``` #### 15.10.2 Capability filter (`has_capabilities`) The agent has `tool:kubectl` but not `permission:deploy-prod`. The deployment runbook unit requires both — it is excluded. The documentation unit has no capability requirements and is returned. ```yaml # Manifest units (excerpt) units: - id: deploy-to-prod path: runbooks/deploy-prod.md intent: "How do I deploy the service to the production Kubernetes cluster?" triggers: [deploy, production, kubernetes, rollout] requires_capabilities: [tool:kubectl, permission:deploy-prod] hints: token_estimate: 3200 - id: deploy-overview path: docs/deploy-overview.md intent: "What is the deployment architecture and release process?" triggers: [deploy, release, architecture] hints: token_estimate: 1400 # Query request terms: ["deploy"] has_capabilities: [tool:kubectl] # agent lacks permission:deploy-prod # Query response results: - unit_id: deploy-overview # included: no requires_capabilities score: 6 # trigger("deploy")=5 + path("deploy")=1 path: docs/deploy-overview.md token_estimate: 1400 match_reason: [trigger, path] source_manifest: null # deploy-to-prod excluded: requires permission:deploy-prod, not in has_capabilities ``` #### 15.10.3 Federation scope (`federation_scope: declared` + `source_manifest`) A hub manifest federates two sub-manifests. A query with `federation_scope: declared` returns results from all three; each result carries `source_manifest` so the agent knows which base URL to resolve paths against. ```yaml # Hub manifest (excerpt) manifests: - id: payments-service url: "https://payments.acme.internal/knowledge.yaml" relationship: component - id: identity-service url: "https://identity.acme.internal/knowledge.yaml" relationship: component # Query request terms: ["api-key", "authentication"] federation_scope: declared # Query response results: - unit_id: api-key-rotation score: 6 # trigger("api-key")=5 + id("api-key")=1 path: docs/api-keys.md # resolve relative to payments-service base URL token_estimate: 2100 match_reason: [trigger, id] source_manifest: payments-service # from sub-manifest - unit_id: oauth2-token-exchange score: 5 # trigger("authentication")=5 path: docs/oauth2.md # resolve relative to identity-service base URL token_estimate: 3800 match_reason: [trigger] source_manifest: identity-service # from sub-manifest - unit_id: portal-getting-started score: 1 # path("api-key")=1 substring match path: docs/getting-started.md token_estimate: 900 match_reason: [path] source_manifest: null # local unit — resolve relative to hub manifest ``` --- ## Appendix A: Minimal Example ```yaml kcp_version: "0.12" project: my-project version: 1.0.0 units: - id: overview path: README.md intent: "What is this project and how do I get started?" scope: global audience: [human, agent] ``` --- ## Appendix B: Full Example ```yaml kcp_version: "0.12" project: wiki.example.org version: 2.3.0 updated: "2026-03-07" language: en license: "Apache-2.0" indexing: open hints: total_token_estimate: 84000 unit_count: 9 recommended_entry_point: about has_summaries: true trust: provenance: publisher: "Example Corp" publisher_url: "https://wiki.example.org" contact: "[email protected]" audit: agent_must_log: true require_trace_context: false auth: methods: - type: oauth2 issuer: "https://auth.example.org" scopes: ["read:docs"] - type: none payment: default_tier: free manifests: - id: platform-core url: "https://platform.example.org/knowledge.yaml" label: "Core Platform Services" relationship: foundation update_frequency: weekly - id: security-policies url: "https://security.example.org/knowledge.yaml" label: "Security & Compliance Policies" relationship: governs update_frequency: daily units: - id: about path: about.md intent: "Who maintains this project and what is it for?" scope: global audience: [human, agent] validated: "2026-02-24" update_frequency: monthly hints: token_estimate: 800 load_strategy: eager priority: critical - id: architecture-overview path: architecture/overview.md intent: "What is the high-level architecture and which components exist?" format: markdown scope: global audience: [developer, architect, agent] validated: "2026-01-15" update_frequency: monthly depends_on: [about] triggers: [architecture, components, system-design, overview] hints: token_estimate: 18000 load_strategy: lazy priority: supplementary density: dense summary_available: true summary_unit: architecture-tldr - id: architecture-tldr path: architecture/overview-tldr.md intent: "What is the high-level architecture in 400 words?" format: markdown scope: global audience: [developer, architect, agent] depends_on: [about] hints: token_estimate: 500 load_strategy: eager priority: critical summary_of: architecture-overview - id: api-spec path: api/openapi.yaml kind: schema intent: "What endpoints does the API expose and what do they accept?" format: openapi content_type: "application/vnd.oai.openapi+yaml;version=3.1" scope: module audience: [developer, agent] validated: "2026-02-25" update_frequency: weekly depends_on: [architecture-overview] triggers: [api, endpoints, openapi, rest] hints: token_estimate: 12000 load_strategy: lazy priority: reference density: dense - id: deployment-guide path: ops/deployment.md intent: "How do I deploy version 3.x to production?" scope: project audience: [operator, developer, agent] validated: "2026-02-20" update_frequency: rarely depends_on: [architecture-overview] supersedes: deployment-guide-v2 triggers: [deployment, production, release, kubernetes, docker] external_depends_on: - manifest: platform-core unit: infra-config on_failure: warn - manifest: security-policies unit: deployment-policy on_failure: degrade hints: token_estimate: 8000 load_strategy: lazy priority: supplementary - id: authentication-api path: api/authentication.md intent: "How do I authenticate API requests using OAuth 2.0?" scope: module audience: [developer, agent] validated: "2026-02-18" depends_on: [architecture-overview] triggers: [oauth2, authentication, bearer-token, jwt, api-security] hints: token_estimate: 4000 load_strategy: lazy - id: pre-commit-gate kind: policy path: .husky/pre-commit intent: "What checks run automatically before every commit?" format: text scope: project audience: [developer, agent] validated: "2026-02-10" access: restricted auth_scope: dev-team sensitivity: internal indexing: read-only hints: token_estimate: 200 load_strategy: lazy - id: methodology-no path: docs/methodology-no.md intent: "Hvilken utviklingsmetodologi brukes?" language: "no" scope: global audience: [human] validated: "2026-01-15" license: spdx: "CC-BY-4.0" url: "https://creativecommons.org/licenses/by/4.0/" attribution_required: true hints: token_estimate: 6000 load_strategy: lazy - id: deployment-guide-v2 path: archive/deployment-v2.md intent: "Legacy deployment procedure for version 2.x (superseded)." scope: project audience: [agent] validated: "2025-09-01" deprecated: true hints: load_strategy: never priority: reference relationships: - from: architecture-overview to: deployment-guide type: enables - from: architecture-overview to: authentication-api type: enables - from: about to: architecture-overview type: context - from: deployment-guide to: deployment-guide-v2 type: supersedes ``` --- ## Appendix C: llms.txt Integration To declare a KCP manifest that is not at the root, add a `knowledge:` line to the `llms.txt` header: ``` # My Project > A project that does useful things. > knowledge: /docs/knowledge.yaml ## Docs - /docs/overview.md: Project overview ``` --- ## Appendix D: Reference Implementations The following open-source projects implement KCP concepts and serve as reference implementations. ### Synthesis [github.com/exoreaction/synthesis](https://github.com/exoreaction/synthesis) A KCP-native knowledge infrastructure server. Synthesis indexes workspaces (code, documentation, configuration, PDFs) and exposes them via MCP with sub-second retrieval. **Bi-directional KCP integration:** - **Consumes** `knowledge.yaml`: during `synthesis scan`, any `knowledge.yaml` found at a repo root is parsed to understand unit boundaries, intent, and relationships — no separate import step needed. Repos without a manifest fall back to heuristic detection. - **Produces** `knowledge.yaml`: `synthesis export --format kcp` generates a manifest from any indexed workspace, making it immediately compatible with KCP-aware tooling. The `synthesis dispatch` command composes skill matching, file retrieval, and team-conflict detection into a single call that returns a ready-to-use agent configuration — a KCP-native pattern for supervisor/orchestrator agent architectures. ### kcp-commands [github.com/cantara/kcp-commands](https://github.com/cantara/kcp-commands) A Claude Code hook that applies KCP at the Bash tool boundary. Each manifest is a `knowledge.yaml`-compatible description of a CLI command. Three phases: - **Phase A** (PreToolUse): injects concise syntax context before execution — eliminates `--help` calls - **Phase B** (PostToolUse, via daemon `/filter/{key}`): strips noise patterns and truncates large outputs before they reach the model's context window - **Phase C** (v0.9.0, EventLogger): appends a JSON event to `~/.kcp/events.jsonl` on every Bash hook call — `{"ts":"...","session_id":"...","project_dir":"...","tool":"Bash","command":"...","manifest_key":"..."}` — consumed by kcp-memory for tool-level episodic memory Ships with 284 bundled manifests covering Git, Linux/macOS, Docker, Kubernetes, cloud CLIs, build tools, database clients, and more. Daemon on `localhost:7734` (virtual threads, ~12ms warm latency). Falls back to Node.js CLI if daemon not running. Measured impact: **67,352 tokens saved per session — 33.7% of a 200K context window recovered.** ### opencode-kcp-plugin [npmjs.com/package/opencode-kcp-plugin](https://www.npmjs.com/package/opencode-kcp-plugin) · [source](https://github.com/Cantara/knowledge-context-protocol/tree/main/plugins/opencode) A plugin for [OpenCode](https://github.com/anomalyco/opencode) that reduces explore-agent tool calls by 73–80% using a project's `knowledge.yaml` manifest. Uses two hooks: `experimental.chat.system.transform` to inject the full knowledge map into every session's system prompt, and `tool.execute.after` to annotate glob/grep results with KCP intent strings. Install: `npm install opencode-kcp-plugin`. Configure: `"plugin": ["opencode-kcp-plugin"]` in `opencode.json`. Zero overhead when no `knowledge.yaml` is present. ### kcp-memory [github.com/cantara/kcp-memory](https://github.com/Cantara/kcp-memory) The episodic memory layer for Claude Code. Indexes `~/.claude/projects/**/*.jsonl` session transcripts and `~/.kcp/events.jsonl` tool-call events (written by kcp-commands Phase C) into a local SQLite+FTS5 database, making past sessions and individual tool invocations searchable in milliseconds. Three-layer model this completes: | Layer | What it holds | Provided by | |-------|--------------|-------------| | **Working** | Current context window | Claude Code | | **Episodic** | What happened in past sessions | kcp-memory | | **Semantic** | What the workspace means | Synthesis | Available as a CLI, HTTP daemon (`localhost:7735`), and MCP server (6 tools): | MCP Tool | What it answers | |----------|----------------| | `kcp_memory_search` | FTS5 search over session transcripts | | `kcp_memory_events_search` | FTS5 search over tool-call events | | `kcp_memory_list` | Recent sessions, optionally by project | | `kcp_memory_stats` | Aggregate statistics | | `kcp_memory_session_detail` | Full content of a specific session | | `kcp_memory_project_context` | Auto-detect project from `PWD`, return last 5 sessions + 20 events | `kcp_memory_project_context` is designed for session-start invocation: it reads the current working directory from the process environment and surfaces recent history with no query required — closing the blank-slate problem structurally. Install: `curl -fsSL https://raw.githubusercontent.com/Cantara/kcp-memory/main/bin/install.sh | bash` --- *Knowledge Context Protocol — proposed by [eXOReaction AS](https://www.exoreaction.com), Oslo, Norway.* *Spec repository: [github.com/cantara/knowledge-context-protocol](https://github.com/cantara/knowledge-context-protocol)*

Related Documents

Autonomous SaaS Development Agent

Shadcn UI Rules

commit

AGENTS.md