DB Growth Guardrails & Maintenance Runbook

This runbook defines when to intervene on Cortex DB growth and how to do it safely.

hurttlocker

May 2, 2026

0 downloads

0 views

ai rag workflow guardrails

View source

DB Growth Guardrails & Maintenance Runbook

This runbook defines when to intervene on Cortex DB growth and how to do it safely.

Why this exists

As memory/fact volume grows, output and operator workflows can degrade before correctness fails. This guide tracks thresholds and a repeatable maintenance path.

Daily / Weekly Checks

Daily (quick)

cortex stats

Review:

storage_bytes
24h growth in memories/facts
alerts (if present)

Weekly (operator)

cortex stats --json
cortex stats --growth-report --json
cortex stats --growth-report --top-source-files 20
cortex stale 7
cortex optimize --check-only

Confirm whether growth is expected (imports, captures) vs noise churn.

Growth Forensics Loop (report-first)

Use this before any maintenance pass so attribution is explicit and comparable:

# Before snapshot
cortex stats --growth-report --json > /tmp/growth-before.json

# Optional maintenance (only if recommendation indicates)
cortex optimize

# After snapshot
cortex stats --growth-report --json > /tmp/growth-after.json

Record in the ops note:

recommendation (no-op or maintenance-pass)
top source contributors (before vs after)
fact-type mix shifts (before vs after)

Thresholds

Treat these as intervention triggers:

DB size notice: storage_bytes > 1.0 GB
- Action: schedule weekly review and confirm expected growth source.
DB size warning: storage_bytes > 1.5 GB
- Action: run maintenance window (below) and verify post-maintenance deltas.
Fact growth spike alert (24h)
- Action: inspect recent imports/capture sources and conflict/stale outputs for noise.
Memory growth spike alert (24h)
- Action: validate capture hygiene and source dedupe behavior.

Maintenance Window (Safe)

Run during low-traffic windows.

Backup DB file:

cp ~/.cortex/cortex.db ~/.cortex/cortex.db.backup.$(date +%Y%m%d%H%M%S)

Run built-in maintenance (full path):

cortex optimize

Optional targeted modes:

cortex optimize --check-only
cortex optimize --vacuum-only
cortex optimize --analyze-only

Verify post-state:

cortex stats --json

Compare size and growth metrics to pre-maintenance snapshot.

Output Scaling Guidance

For large conflict sets:

default to compact output:
```
cortex conflicts
```
use --verbose only when deep triage is required:
```
cortex conflicts --verbose
```
for machine workflows, prefer JSON + downstream filtering:
```
cortex conflicts --json
```

SLO Checkpoints (Operator)

Track these checkpoints during growth reviews:

cortex stats: completes under 3s on current production-scale DBs.
cortex search "<common query>" --mode hybrid --limit 10: under 5s baseline on warmed local DB.
cortex conflicts (default compact mode): returns summary output without terminal spam or hangs.

If checkpoints regress materially, file/track under #64 and attach command output + DB size context.

Automated SLO Snapshot Report

Use the helper script to capture checkpoint timing + status in one artifact:

scripts/slo_snapshot.sh \
  --warn-stats-ms 3000 --warn-search-ms 5000 --warn-conflicts-ms 5000 \
  --fail-stats-ms 7000 --fail-search-ms 10000 --fail-conflicts-ms 12000 \
  --output /tmp/slo.json --markdown /tmp/slo.md

Optional production-style run (hybrid search):

scripts/slo_snapshot.sh \
  --db ~/.cortex/cortex.db \
  --query "deployment policy" \
  --mode hybrid \
  --embed ollama/nomic-embed-text \
  --output /tmp/slo-hybrid.json \
  --markdown /tmp/slo-hybrid.md

The script emits PASS, WARN, or FAIL status in output artifacts and exits non-zero on command failures or fail-threshold breaches (unless --warn-only-thresholds is set).

CI canary is also available via GitHub Actions workflow: .github/workflows/slo-canary.yml. It performs trend comparison against the previous successful canary artifact (scripts/slo_trend_compare.py) and applies budget policy overlays (scripts/slo_budget_guard.py) before artifact publish.

Related Tracking

#64 — DB growth guardrails follow-through
#74 — post-v0.3.4 reliability wave
#82 — SLO snapshot report tooling

DB Growth Guardrails & Maintenance Runbook

DB Growth Guardrails & Maintenance Runbook

Why this exists

Daily / Weekly Checks

Daily (quick)

Weekly (operator)

Growth Forensics Loop (report-first)

Thresholds

Maintenance Window (Safe)

Output Scaling Guidance

SLO Checkpoints (Operator)

Automated SLO Snapshot Report

Related Tracking

Related Documents

Guardrails, Safety & Content Filtering

DeepSeek R1: Case Study in Failed Extrinsic Alignment

AI Safety & Guardrails for Voice Assistants

LlmGuard Framework - Complete Implementation Buildout