Loading...
Loading...
# Aegis Language — Detailed Phase Plans
**Created**: 2026-03-06
**Source**: 4-agent analysis (codebase, security, competitive, AI features)
**Covers**: Phase 6 through Phase 10
---
## Phase 6: Security Hardening
**Priority**: URGENT — without this, Aegis's core security claims are aspirational
**Effort**: Medium (~3-4 sessions)
**Impact**: Critical — fixes 7 critical + 4 high severity vulnerabilities
**Depends on**: Nothing (can start immediately)
### Goal
Shift Aegis from an analyzer-trusting model to a runtime-guaranteeing model. After this phase, transpiled Python code is genuinely sandboxed — dangerous operations are impossible, not just flagged.
### 6a: Restricted exec() Namespace (CRITICAL — fixes C1, C4, C7, H4)
**Problem**: `_seed_runtime_namespace()` in `aegis_cli.py:33-51` creates a `ConstNamespace` with guarded `open`/`print`/`input`, but does NOT restrict `__builtins__`. Python's default builtins are inherited, giving transpiled code access to `eval()`, `exec()`, `__import__()`, `globals()`, `vars()`, `compile()`, `breakpoint()`, `type()`, `getattr()` on internals, etc.
**Implementation**:
1. **Create `aegis/runtime/sandbox.py`** — new module for the restricted builtins dict:
```python
"""Aegis sandbox — restricted builtins for transpiled code execution."""
import builtins
# Allowlist of safe builtins
ALLOWED_BUILTINS = {
# Types & constructors
'bool', 'int', 'float', 'str', 'bytes', 'bytearray',
'list', 'tuple', 'dict', 'set', 'frozenset',
'complex', 'range', 'slice', 'object', 'type',
'memoryview', 'enumerate', 'zip', 'map', 'filter',
# Numeric & math
'abs', 'divmod', 'max', 'min', 'pow', 'round', 'sum',
# String & formatting
'ascii', 'bin', 'chr', 'format', 'hex', 'oct', 'ord', 'repr',
# Iteration & sequences
'all', 'any', 'iter', 'len', 'next', 'reversed', 'sorted',
# Object introspection (safe subset)
'callable', 'hash', 'id', 'isinstance', 'issubclass',
'hasattr', 'getattr', 'setattr', 'delattr', 'dir',
'property', 'staticmethod', 'classmethod', 'super',
# Exceptions (all standard exceptions)
'Exception', 'TypeError', 'ValueError', 'KeyError',
'IndexError', 'AttributeError', 'RuntimeError',
'StopIteration', 'StopAsyncIteration', 'GeneratorExit',
'ArithmeticError', 'LookupError', 'OverflowError',
'ZeroDivisionError', 'FloatingPointError',
'AssertionError', 'ImportError', 'ModuleNotFoundError',
'NameError', 'UnboundLocalError', 'OSError',
'FileNotFoundError', 'FileExistsError', 'PermissionError',
'IsADirectoryError', 'NotADirectoryError',
'InterruptedError', 'IOError', 'EOFError',
'UnicodeError', 'UnicodeDecodeError', 'UnicodeEncodeError',
'NotImplementedError', 'RecursionError',
'SystemError', 'Warning', 'UserWarning',
'DeprecationWarning', 'PendingDeprecationWarning',
'SyntaxWarning', 'RuntimeWarning', 'FutureWarning',
'ResourceWarning', 'UnicodeWarning', 'BytesWarning',
# Constants
'True', 'False', 'None', 'Ellipsis', 'NotImplemented',
'__name__', '__doc__',
}
# Explicitly BLOCKED builtins (the security-critical ones)
BLOCKED_BUILTINS = {
'eval', 'exec', 'compile', # Code execution
'__import__', # Import system
'globals', 'locals', 'vars', # Namespace introspection
'breakpoint', # Debugger access
'exit', 'quit', # Process control
'open', # Replaced by guarded_open
'print', # Replaced by guarded_print
'input', # Replaced by guarded_input
}
def make_restricted_builtins() -> dict:
"""Build a builtins dict with dangerous operations removed."""
restricted = {}
for name in ALLOWED_BUILTINS:
val = getattr(builtins, name, None)
if val is not None:
restricted[name] = val
return restricted
```
2. **Update `_seed_runtime_namespace()` in `aegis_cli.py`**:
```python
from aegis.runtime.sandbox import make_restricted_builtins
def _seed_runtime_namespace(path: str = "<stdin>") -> ConstNamespace:
search_dir = ...
cache = ModuleCache(search_paths=[search_dir])
restricted = make_restricted_builtins()
restricted['print'] = guarded_print
restricted['open'] = guarded_open
restricted['input'] = guarded_input
restricted['__import__'] = _aegis_import_hook # See 6b
ns = ConstNamespace({
"__name__": "__aegis_main__",
"__file__": path,
"__builtins__": restricted, # KEY CHANGE
"print": guarded_print,
"open": guarded_open,
"input": guarded_input,
})
ns['aegis_import'] = _bound_aegis_import
return ns
```
3. **Update `runtime/importer.py:36`** — the `aegis_import` function also calls `exec(code, ns)`. Ensure it passes the same restricted builtins.
4. **Update harness pattern** — demo apps in `apps/*/harness.py` all call `exec(code, namespace)`. Each harness needs to use restricted builtins too. Consider extracting a shared `aegis_exec(code, namespace)` helper.
**Files to modify**:
- Create: `aegis/runtime/sandbox.py`
- Modify: `aegis_cli.py` (lines 33-51)
- Modify: `aegis/runtime/importer.py` (line 36)
- Modify: All `apps/*/harness.py` files (7 files)
**Tests to add** (`tests/test_phase6_sandbox.py`):
- exec() namespace doesn't contain `eval`, `exec`, `compile`, `__import__`
- exec() namespace doesn't contain `globals`, `locals`, `vars`
- Transpiled code that tries `eval("1+1")` raises `NameError`
- Transpiled code that tries `exec("x=1")` raises `NameError`
- Transpiled code that tries `globals()` raises `NameError`
- Transpiled code that tries `vars()` raises `NameError`
- Transpiled code that tries `compile("x", "", "exec")` raises `NameError`
- All existing 627 tests still pass (no regressions)
**Verification**: Run the full test suite + manually try `eval()` in REPL to confirm it's blocked.
---
### 6b: Import Whitelist (CRITICAL — fixes C1, capability bypass via imports)
**Problem**: Transpiled code can `import os`, `import subprocess`, `import importlib`, etc. The analyzer doesn't check imports. Even with `__import__` removed from builtins (6a), Python `import` statements work through the import machinery which doesn't go through `__builtins__['__import__']` — it uses `importlib._bootstrap._find_and_load` internally.
**Key insight**: When `__builtins__` is a dict (not a module), Python's `import` statement DOES use `__builtins__['__import__']`. Since we set `__builtins__` to a dict in 6a, we can control imports by providing a custom `__import__` function.
**Implementation**:
1. **Create import hook in `sandbox.py`**:
```python
# Safe modules that Aegis code is allowed to import
IMPORT_WHITELIST = {
# Standard library (safe subset)
'math', 'statistics', 'decimal', 'fractions',
'datetime', 'time', 'calendar',
'json', 'csv',
'collections', 'itertools', 'functools', 'operator',
'string', 're',
'copy', 'enum', 'dataclasses', 'typing',
'uuid', 'hashlib', 'hmac', 'secrets',
'base64', 'binascii',
'textwrap', 'difflib',
'abc', 'numbers',
'pprint', 'logging',
# Aegis runtime (transpiler emits these imports)
'aegis', 'aegis.runtime', 'aegis.runtime.taint',
'aegis.runtime.capabilities', 'aegis.runtime.contracts',
'aegis.runtime.const', 'aegis.runtime.agent',
'aegis.runtime.audit',
}
# Explicitly BLOCKED modules
IMPORT_BLOCKLIST = {
'os', 'sys', 'subprocess', 'shutil', 'pathlib', # System access
'importlib', 'pkgutil', 'zipimport', # Import manipulation
'ctypes', 'cffi', # Native code
'socket', 'http', 'urllib', 'ftplib', 'smtplib', # Network (use tool_call)
'pickle', 'shelve', 'marshal', # Serialization (unsafe)
'code', 'codeop', 'compileall', 'py_compile', # Code compilation
'ast', # AST manipulation
'inspect', 'dis', 'traceback', # Introspection
'signal', 'multiprocessing', 'threading', # Process/thread control
'gc', 'weakref', # GC manipulation
'builtins', '_thread', # Internals
}
def make_import_hook(original_import=__builtins__.__import__ if isinstance(__builtins__, type(__builtins__)) else __builtins__['__import__']):
"""Create a restricted __import__ that only allows whitelisted modules."""
def _aegis_import(name, globals=None, locals=None, fromlist=(), level=0):
# Allow relative imports (level > 0) for Aegis module system
if level > 0:
return original_import(name, globals, locals, fromlist, level)
# Check top-level module name
top_level = name.split('.')[0]
if top_level in IMPORT_BLOCKLIST:
raise ImportError(
f"Import of '{name}' is blocked in Aegis. "
f"Use tool_call for external operations."
)
if name not in IMPORT_WHITELIST and top_level not in IMPORT_WHITELIST:
raise ImportError(
f"Import of '{name}' is not allowed in Aegis. "
f"Allowed modules: {', '.join(sorted(IMPORT_WHITELIST))}"
)
return original_import(name, globals, locals, fromlist, level)
return _aegis_import
```
2. **Wire into restricted builtins** (in `_seed_runtime_namespace`):
```python
restricted['__import__'] = make_import_hook()
```
3. **Add `--unrestricted` CLI flag** for development/debugging that uses full builtins (opt-in escape hatch, not default).
**Files to modify**:
- Modify: `aegis/runtime/sandbox.py` (add import hook)
- Modify: `aegis_cli.py` (wire hook into namespace, add --unrestricted flag)
- Modify: `aegis/runtime/importer.py` (use same hook for imported Aegis modules)
**Tests to add**:
- `import os` in transpiled code raises `ImportError`
- `import subprocess` raises `ImportError`
- `import importlib` raises `ImportError`
- `import ctypes` raises `ImportError`
- `import math` works (whitelisted)
- `import json` works (whitelisted)
- `import datetime` works (whitelisted)
- `from aegis.runtime.taint import ...` works (whitelisted)
- `import collections` works (whitelisted)
- Existing tests still pass (verify aegis runtime imports work)
**Edge cases to test**:
- `from os import path` — blocked (top-level `os` is blocked)
- `import os.path` — blocked
- `importlib.import_module('os')` — blocked (importlib itself blocked)
---
### 6c: Taint Hardening (CRITICAL — fixes C2, C3, H3)
**Problem**: Three taint bypass vectors exist:
1. `.raw` property is public — any code can unwrap tainted values
2. No `__bytes__`, `__iter__`, `__index__` — values leak through these operations
3. Taint lost at function boundaries — analyzer doesn't infer tainted returns
**Implementation**:
1. **Make `.raw` access controlled** in `taint.py`:
```python
class Tainted(Generic[T]):
def __init__(self, value: T, _provenance=None):
# Use name mangling to make value harder to access
self.__value = value
self._provenance = _provenance or TaintProvenance("direct")
@property
def raw(self) -> T:
"""Access raw value — ONLY for use by sanitize() and registered sanitizers.
Raises TaintError if called from non-sanitizer context."""
import inspect
frame = inspect.currentframe()
caller = frame.f_back
caller_name = caller.f_code.co_name if caller else ""
caller_module = caller.f_globals.get('__name__', '')
# Allow access from sanitize functions and aegis runtime
if (caller_name in ('sanitize', '_sanitize_sql', '_sanitize_html',
'_sanitize_shell', '_sanitize_path') or
caller_module.startswith('aegis.runtime') or
caller_name.startswith('_sanitize_')):
return self.__value
# Allow access from registered custom sanitizers
if hasattr(self, '_allowed_callers') and caller_name in self._allowed_callers:
return self.__value
raise TaintError(
f"Direct access to .raw is not allowed. Use sanitize() to unwrap tainted values."
)
```
**Note**: The `inspect.currentframe()` approach has performance overhead. An alternative is to use a context manager or token-based access:
```python
_SANITIZER_TOKEN = object() # module-private sentinel
class Tainted(Generic[T]):
def unwrap(self, _token=None) -> T:
if _token is not _SANITIZER_TOKEN:
raise TaintError("Cannot unwrap tainted value without sanitizer token")
return self.__value
def sanitize(value, context='sql'):
raw = value.unwrap(_token=_SANITIZER_TOKEN)
...
```
The token approach is faster and more Pythonic. Choose between the two based on how strictly we want to enforce access (inspect = stricter but slower, token = faster but importable from runtime module).
**Recommended approach**: Token-based. The `_SANITIZER_TOKEN` is a module-private object in `taint.py`. Since transpiled code can't import from blocked modules and the token is `_`-prefixed (convention for private), it's effectively inaccessible. With the import whitelist from 6b, transpiled code can't `from aegis.runtime.taint import _SANITIZER_TOKEN`.
2. **Add leak-prevention dunder methods** in `taint.py`:
```python
def __bytes__(self):
raise TaintError("Cannot convert tainted value to bytes. Use sanitize() first.")
def __iter__(self):
raise TaintError("Cannot iterate over tainted value. Use sanitize() first.")
def __next__(self):
raise TaintError("Cannot iterate over tainted value. Use sanitize() first.")
def __index__(self):
raise TaintError("Cannot use tainted value as index. Use sanitize() first.")
def __getitem__(self, key):
# Return a new Tainted wrapping the sliced/indexed value
return Tainted(self.__value[key], self._provenance.record({"action": "getitem", "key": repr(key)}))
def __contains__(self, item):
raise TaintError("Cannot check containment in tainted value. Use sanitize() first.")
def __len__(self):
# len() is safe — returns an int, not the tainted data
return len(self.__value)
def __bool__(self):
# bool() is safe — returns True/False
return bool(self.__value)
def __hash__(self):
raise TaintError("Cannot hash tainted value. Use sanitize() first.")
# Prevent pickling
def __reduce__(self):
raise TaintError("Cannot pickle tainted value.")
def __reduce_ex__(self, protocol):
raise TaintError("Cannot pickle tainted value.")
# Prevent JSON serialization
def __json__(self):
raise TaintError("Cannot serialize tainted value to JSON. Use sanitize() first.")
```
3. **Enhance analyzer taint inference across function boundaries** in `analyzer.py`:
- Track which functions have tainted parameters
- If a function receives a `tainted[T]` parameter and the analyzer can see that the return value flows from that parameter (without sanitization), mark the function's return as tainted
- Add warning: "Function 'X' receives tainted input but returns unsanitized value"
- This is a static analysis enhancement, not runtime
**Files to modify**:
- Modify: `aegis/runtime/taint.py` (lines 60-100+ — Tainted class)
- Modify: `aegis/analyzer.py` (taint flow pass — function return inference)
- Modify: Tests that directly access `.raw` need updating to use `sanitize()` instead
**Tests to add** (`tests/test_phase6_taint.py`):
- `bytes(tainted_value)` raises `TaintError`
- `for x in tainted_value` raises `TaintError`
- `tainted_value[0]` returns `Tainted` (not raw value)
- `pickle.dumps(tainted_value)` raises `TaintError`
- `hash(tainted_value)` raises `TaintError`
- `tainted_value.raw` raises `TaintError` (from non-sanitizer context)
- `sanitize(tainted_value, context=sql)` still works (`.raw` accessible from sanitizer)
- Custom sanitizers via `@sanitizer` still work
- `len(tainted_value)` returns int (allowed)
- `bool(tainted_value)` returns bool (allowed)
- Existing taint tests updated and passing
**Risk**: This is the highest-risk change — many existing tests may access `.raw` directly. Run full suite after each sub-change.
---
### 6d: Freeze Runtime State (HIGH — fixes C5, H2)
**Problem**: Transpiled code can modify Aegis runtime globals:
- `_agent._tool_provider = FakeProvider()`
- `_audit._event_ledger = None`
- `emit_event()` is public and can inject false events
**Implementation**:
1. **Make runtime registries read-only after initialization** in `agent.py`:
```python
_FROZEN = False
def freeze_runtime():
"""Lock runtime configuration. Called after harness setup, before exec()."""
global _FROZEN
_FROZEN = True
def _check_frozen(operation: str):
if _FROZEN:
raise RuntimeError(f"Cannot {operation} after runtime is frozen")
def set_tool_provider(provider):
_check_frozen("set tool provider")
global _tool_provider
_tool_provider = provider
# Same pattern for set_input_handler, set_memory_store,
# set_delegation_handler, set_approval_handler
```
2. **Freeze before exec() in CLI** (`aegis_cli.py`):
```python
from aegis.runtime.agent import freeze_runtime
from aegis.runtime.audit import freeze_audit
def cmd_run(...):
ns = _seed_runtime_namespace(path)
freeze_runtime() # Lock agent config
freeze_audit() # Lock audit config
exec(python_code, ns)
```
3. **Protect `emit_event()` in `audit.py`**:
```python
_AUDIT_FROZEN = False
def freeze_audit():
global _AUDIT_FROZEN
_AUDIT_FROZEN = True
def emit_event(event_type, *, _internal=False, **kwargs):
"""Emit an audit event. Only transpiler-injected calls set _internal=True."""
if _AUDIT_FROZEN and not _internal:
raise RuntimeError("Direct emit_event() calls are not allowed. Events are auto-injected.")
...
```
Update transpiler to emit `_audit.emit_event(..., _internal=True)` instead of `_audit.emit_event(...)`.
4. **Harness pattern update**: All `apps/*/harness.py` files call `set_tool_provider()`, `set_delegation_handler()`, etc. BEFORE `exec()`. After those setup calls, call `freeze_runtime()`.
**Files to modify**:
- Modify: `aegis/runtime/agent.py` (add freeze mechanism to all setters)
- Modify: `aegis/runtime/audit.py` (add freeze + _internal flag to emit_event)
- Modify: `aegis/transpiler.py` (emit `_internal=True` on auto-injected events)
- Modify: `aegis_cli.py` (call freeze before exec)
- Modify: All `apps/*/harness.py` (call freeze before exec)
**Tests to add**:
- After `freeze_runtime()`, `set_tool_provider()` raises `RuntimeError`
- After `freeze_audit()`, direct `emit_event()` raises `RuntimeError`
- Transpiler-injected events (with `_internal=True`) still work after freeze
- `reset_registry()` unfreezes (for test teardown)
---
### 6e: Enforce Delegation Capabilities (HIGH — fixes C6)
**Problem**: `CallableDelegationHandler` in `agent.py:295-322` receives `capabilities` in the `DelegationContext` but the handler ignores them — delegated code runs with full permissions.
**Implementation**:
1. **Wrap delegated callable in CapabilityContext** in `agent.py`:
```python
class CallableDelegationHandler:
def handle(self, context: DelegationContext) -> Any:
handler = self._handlers.get(context.target)
if handler is None:
raise RuntimeError(f"No handler registered for '{context.target}'")
# Enforce capability restrictions on delegated code
if context.capabilities:
from aegis.runtime.capabilities import CapabilityContext
cap_ctx = CapabilityContext(allow=context.capabilities, deny=[])
with cap_ctx:
return handler(
task=context.task,
memory_share=context.memory_share,
memory_deny=context.memory_deny,
timeout=context.timeout,
max_cost=context.max_cost,
)
else:
return handler(...)
```
2. **Enforce memory_deny** — if `context.memory_deny` is set, wrap memory access to block denied keys.
**Files to modify**:
- Modify: `aegis/runtime/agent.py` (CallableDelegationHandler.handle)
**Tests to add**:
- Delegate with `capabilities: [network.https]` — filesystem operations inside handler raise `CapabilityError`
- Delegate with `memory_deny: [secrets]` — handler can't access "secrets" memory key
- Existing delegation tests still pass
---
### 6f: Replace XOR Encryption (HIGH — fixes H1)
**Problem**: Memory encryption in `agent.py:192-210` uses XOR cipher with deterministic key derivation. Same plaintext always produces same ciphertext. No IV, no authentication.
**Implementation**:
1. **Add `cryptography` dependency** (or use `secrets` + `hashlib` for a better homebrew if avoiding deps):
**Option A — Using `cryptography` library (recommended)**:
```python
from cryptography.fernet import Fernet
import base64, hashlib
def _derive_fernet_key(scope: str, secret: str = "") -> bytes:
"""Derive a Fernet-compatible key from scope + optional secret."""
raw_key = hashlib.sha256(f"aegis-{secret}-{scope}".encode()).digest()
return base64.urlsafe_b64encode(raw_key)
def _encrypt(data: str, scope: str) -> str:
f = Fernet(_derive_fernet_key(scope))
return f.encrypt(data.encode()).decode()
def _decrypt(data: str, scope: str) -> str:
f = Fernet(_derive_fernet_key(scope))
return f.decrypt(data.encode()).decode()
```
**Option B — No external deps (using AES via hashlib + secrets)**:
```python
import secrets, hashlib, hmac, base64
def _encrypt(data: str, scope: str) -> str:
key = hashlib.sha256(f"aegis-memory-{scope}".encode()).digest()
iv = secrets.token_bytes(16)
# Use XOR with key stream from HMAC-SHA256 (stream cipher)
plaintext = data.encode()
keystream = b''
counter = 0
while len(keystream) < len(plaintext):
keystream += hmac.new(key, iv + counter.to_bytes(4, 'big'), 'sha256').digest()
counter += 1
ciphertext = bytes(p ^ k for p, k in zip(plaintext, keystream))
mac = hmac.new(key, iv + ciphertext, 'sha256').digest()[:16]
return base64.b64encode(iv + mac + ciphertext).decode()
def _decrypt(data: str, scope: str) -> str:
key = hashlib.sha256(f"aegis-memory-{scope}".encode()).digest()
raw = base64.b64decode(data.encode())
iv, mac, ciphertext = raw[:16], raw[16:32], raw[32:]
expected_mac = hmac.new(key, iv + ciphertext, 'sha256').digest()[:16]
if not hmac.compare_digest(mac, expected_mac):
raise ValueError("Memory integrity check failed — data may be tampered")
# Decrypt
keystream = b''
counter = 0
while len(keystream) < len(ciphertext):
keystream += hmac.new(key, iv + counter.to_bytes(4, 'big'), 'sha256').digest()
counter += 1
plaintext = bytes(c ^ k for c, k in zip(ciphertext, keystream))
return plaintext.decode()
```
**Recommendation**: Option B (no external deps) keeps Aegis dependency-free while providing real encryption with IV + MAC. Option A is stronger but adds a dependency.
2. **Replace `_xor_crypt` and `_derive_key`** in `agent.py` with the chosen implementation.
3. **Update `MemoryScope.set()` and `MemoryScope.get()`** to use new encrypt/decrypt.
**Files to modify**:
- Modify: `aegis/runtime/agent.py` (lines 185-210 — encryption functions)
**Tests to add**:
- Same plaintext encrypted twice produces DIFFERENT ciphertext (IV)
- Decrypt(Encrypt(data)) == data
- Tampered ciphertext raises error (integrity check)
- Existing memory encryption tests updated
---
### 6g: Block globals()/vars() in Analyzer (MEDIUM — fixes H4)
**Problem**: `globals()` and `vars()` allow runtime introspection that can access blocked builtins and runtime state.
**Implementation**:
1. **Add to banned operations** in `capabilities.py`:
```python
BANNED_OPERATIONS = frozenset({'eval', 'exec', '__import__', 'globals', 'vars', 'locals', 'compile', 'breakpoint'})
```
2. **Update analyzer** to flag these in `analyzer.py` (add to banned op detection pass).
3. **Already handled at runtime by 6a** — `globals` and `vars` removed from restricted builtins dict. This step adds the static analysis warning.
**Files to modify**:
- Modify: `aegis/runtime/capabilities.py` (line 21)
- Modify: `aegis/analyzer.py` (banned op detection)
**Tests to add**:
- Analyzer warns on `globals()` usage
- Analyzer warns on `vars()` usage
- `compile()` and `breakpoint()` also warned
---
### Phase 6 Completion Criteria
- [ ] All 7 critical vulnerabilities (C1-C7) are fixed
- [ ] All 4 high severity issues (H1-H4) are fixed
- [ ] All existing 627 non-LLM tests pass (no regressions)
- [ ] New test file `tests/test_phase6_sandbox.py` with 40+ tests
- [ ] New test file `tests/test_phase6_taint.py` with 20+ tests
- [ ] Red team harness updated with new bypass attempt tests
- [ ] Demo apps updated with freeze_runtime() calls
- [ ] CLAUDE.md updated with new security architecture
---
## Phase 7: MCP Integration
**Priority**: HIGH — ecosystem interoperability
**Effort**: Medium (~2-3 sessions)
**Impact**: Very High — instant access to thousands of tools
**Depends on**: Phase 6 (security hardening should be done first)
### Goal
Make Aegis agents interoperable with the MCP (Model Context Protocol) ecosystem while retaining all Aegis safety guarantees (taint tracking on MCP outputs, capability scoping on MCP tools, audit trail for MCP invocations).
### 7a: MCP Client Runtime
**New file**: `aegis/runtime/mcp.py`
**Implementation**:
1. **MCP client library** — communicate with MCP servers via stdio or HTTP:
```python
"""Aegis MCP integration — secure Model Context Protocol client."""
class MCPServer:
"""Represents a connected MCP server."""
def __init__(self, name: str, command: list[str] = None, url: str = None):
self.name = name
self.tools: dict[str, MCPToolSchema] = {}
# Connect and discover tools...
def discover_tools(self) -> list[MCPToolSchema]:
"""List available tools from this MCP server."""
...
def invoke(self, tool_name: str, arguments: dict) -> Any:
"""Call an MCP tool and return the result."""
...
class MCPToolSchema:
"""Describes an MCP tool's input/output schema."""
name: str
description: str
input_schema: dict # JSON Schema
# Map to Aegis capability categories
inferred_capabilities: list[str]
class MCPRegistry:
"""Global registry of MCP servers and tools."""
_servers: dict[str, MCPServer] = {}
def register_server(self, server: MCPServer):
...
def get_tool(self, tool_name: str) -> tuple[MCPServer, MCPToolSchema]:
...
```
2. **Taint wrapping** — all MCP tool outputs are automatically wrapped in `Tainted`:
```python
def invoke_mcp_tool(tool_name, arguments, *, taint_output=True):
server, schema = _mcp_registry.get_tool(tool_name)
result = server.invoke(tool_name, arguments)
if taint_output:
return Tainted(result, TaintProvenance(f"mcp:{server.name}/{tool_name}"))
return result
```
3. **Capability mapping** — MCP tool categories auto-map to Aegis capabilities:
```python
MCP_CAPABILITY_MAP = {
'filesystem': 'filesystem',
'browser': 'network.https',
'database': 'filesystem',
'network': 'network',
'shell': 'process.spawn',
}
```
### 7b: Language Syntax — `mcp_tool` or Extended `tool_call`
**Option A — New `mcp_tool` keyword**:
```aegis
mcp_tool search(query: str) -> list:
server: "brave-search"
tool: "brave_web_search"
timeout: 30s
taint_output: True
```
**Option B — Extend `tool_call` with `provider: mcp`**:
```aegis
tool_call search(query: str) -> list:
provider: "mcp:brave-search/brave_web_search"
timeout: 30s
taint_output: True
```
**Recommendation**: Option B — reuse existing `tool_call` syntax. The `provider` field already exists; prefix with `mcp:` to indicate MCP source. Less new syntax to learn.
**Implementation across pipeline**:
1. **Lexer**: No changes (reuses existing tool_call tokens)
2. **Parser**: No changes (reuses existing tool_call parsing)
3. **Analyzer**: Detect `provider: "mcp:..."` pattern, infer capabilities from MCP tool schema, validate MCP server is registered
4. **Transpiler**: When provider starts with `mcp:`, generate `invoke_mcp_tool()` call instead of `invoke_tool()`
5. **Runtime**: `invoke_mcp_tool()` wraps result in Tainted, checks capabilities, emits audit event
### 7c: MCP Server Configuration
**Config file**: `aegis.toml` or `aegis.yaml` in project root:
```toml
[mcp.servers]
[mcp.servers.brave-search]
command = ["npx", "@anthropic-ai/mcp-server-brave-search"]
env = { BRAVE_API_KEY = "$BRAVE_API_KEY" }
[mcp.servers.filesystem]
command = ["npx", "@anthropic-ai/mcp-server-filesystem"]
args = ["/allowed/path"]
capabilities = ["filesystem.read"] # Aegis capability mapping
```
### 7d: Audit Integration
- MCP tool invocations auto-logged with event_type `mcp_tool_call`
- Audit events include: server name, tool name, input arguments (with redaction), output (tainted marker), duration, capability check result
- Hash-chained into existing ledger
**Files to create**:
- `aegis/runtime/mcp.py` (MCP client, registry, invocation)
**Files to modify**:
- `aegis/analyzer.py` (MCP provider detection, capability inference)
- `aegis/transpiler.py` (MCP provider codegen)
- `aegis/runtime/audit.py` (MCP event types)
- `aegis_cli.py` (load MCP config on startup)
**Tests to add** (`tests/test_phase7_mcp.py`):
- MCP tool output is automatically Tainted
- MCP tool call requires matching capability
- MCP tool call emitted in audit trail
- MCP tool with timeout works
- MCP tool with retry works
- Invalid MCP server raises clear error
- ~25 tests
### Phase 7 Completion Criteria
- [ ] MCP client runtime with server discovery and tool invocation
- [ ] `tool_call` with `provider: "mcp:server/tool"` compiles and runs
- [ ] MCP outputs automatically tainted
- [ ] MCP tools checked against capability system
- [ ] MCP invocations logged in audit trail
- [ ] Config file loading for MCP servers
- [ ] 25+ new tests
- [ ] Example: `examples/mcp_demo.aegis` with MCP tool usage
---
## Phase 8: Advanced AI Constructs
**Priority**: MEDIUM — differentiation features
**Effort**: Medium-High (~3-4 sessions)
**Impact**: High — unique capabilities no other framework has
**Depends on**: Phase 6 (some constructs need hardened runtime)
### 8a: `reason` Block — Structured Reasoning with Audit
**What**: A new language construct that captures structured reasoning into the hash-chained audit trail. This is uniquely valuable — no other framework makes AI reasoning tamper-evident.
**Syntax**:
```aegis
reason "Determine threat severity":
let indicators = extract_indicators(alert)
let count = len(indicators)
if count > 5:
let conclusion = "critical"
elif count > 2:
let conclusion = "high"
else:
let conclusion = "low"
# Block result is the last expression or explicit 'conclude' statement
conclude conclusion
```
**Implementation across pipeline**:
1. **Lexer** — new tokens: `REASON`, `CONCLUDE`
2. **Parser** — `ReasonBlock` AST node:
```python
@dataclass
class ReasonBlock(Statement):
description: str # "Determine threat severity"
body: list[Statement] # Block body
conclusion: Expression # The conclude expression
line: int
column: int
```
3. **Analyzer** — validate `conclude` appears in reason block, check taint flow through reasoning
4. **Transpiler** — generates:
```python
# reason "Determine threat severity"
_reason_scope = _audit.EventScope("reason", {"description": "Determine threat severity"})
with _reason_scope:
indicators = extract_indicators(alert)
count = len(indicators)
if count > 5:
conclusion = "critical"
elif count > 2:
conclusion = "high"
else:
conclusion = "low"
_reason_result = conclusion
_audit.emit_event("reasoning_complete", _internal=True,
description="Determine threat severity",
inputs={"alert": _audit.snapshot_value(alert)},
outputs={"conclusion": _audit.snapshot_value(_reason_result)},
status="ok")
```
5. **Runtime** — reasoning events stored in audit trail with full input/output snapshots, parent-child scoping, hash chaining
**Files to create/modify**:
- Modify: `aegis/tokens.py` (REASON, CONCLUDE tokens)
- Modify: `aegis/ast_nodes.py` (ReasonBlock node)
- Modify: `aegis/lexer.py` (tokenize reason, conclude)
- Modify: `aegis/parser.py` (parse reason block)
- Modify: `aegis/analyzer.py` (validate reason blocks)
- Modify: `aegis/transpiler.py` (transpile reason blocks)
**Tests**: ~20 (lexer + parser + analyzer + transpiler + runtime integration)
---
### 8b: Semantic Memory — Vector-Backed Search
**What**: Extend `memory_access` with a `type: semantic` option for vector-backed similarity search, while keeping Aegis's encryption and access controls.
**Syntax**:
```aegis
memory_access KnowledgeBase:
scope: "domain_knowledge"
type: semantic
read: [policies, procedures, guidelines]
write: [policies]
encrypt: True
embedding_model: "default"
# Usage in code:
KnowledgeBase.store("refund_policy", "Full refund within 30 days...")
let results = KnowledgeBase.search("refund for damaged item", top_k=5)
```
**Implementation**:
1. **New memory type enum** in `agent.py`:
```python
class MemoryType(Enum):
KV = "kv" # Current key-value (default)
SEMANTIC = "semantic" # Vector-backed similarity search
EPISODIC = "episodic" # Time-ordered event memory
```
2. **SemanticMemoryStore protocol** in `agent.py`:
```python
@runtime_checkable
class SemanticMemoryStore(Protocol):
def store(self, key: str, text: str, metadata: dict = None) -> None: ...
def search(self, query: str, top_k: int = 5) -> list[dict]: ...
def delete(self, key: str) -> None: ...
```
3. **InMemorySemanticStore** — reference implementation using simple TF-IDF or cosine similarity (no external deps). Production users plug in real vector DBs.
4. **Parser**: Recognize `type: semantic` in memory_access declarative block
5. **Transpiler**: Generate `SemanticMemoryScope` class with `store()` and `search()` methods
6. **Analyzer**: Validate semantic-specific properties
**Files to modify**:
- Modify: `aegis/runtime/agent.py` (SemanticMemoryStore, MemoryType)
- Modify: `aegis/parser.py` (parse `type:` in memory_access)
- Modify: `aegis/transpiler.py` (generate semantic memory class)
- Modify: `aegis/analyzer.py` (validate semantic memory properties)
**Tests**: ~15
---
### 8c: Risk-Based Approval for Plans
**What**: Replace all-or-nothing `require_approval: True` with risk-based policies.
**Syntax**:
```aegis
plan deploy(version: str):
intent: "Deploy to production"
approval_policy: risk_based
step run_tests:
@risk_level(low)
run_test_suite(version)
verify test_results_pass()
step deploy_staging:
@risk_level(medium)
deploy_to(version, "staging")
verify health_check("staging")
step deploy_production:
@risk_level(critical)
deploy_to(version, "production")
verify health_check("production")
rollback:
rollback_deployment("production")
```
**Implementation**:
1. **New tokens**: `APPROVAL_POLICY`, `RISK_LEVEL` (or reuse decorator syntax)
2. **Parser**: Parse `approval_policy:` in plan declarative block. Parse `@risk_level(...)` as step decorator.
3. **Runtime**: `RiskBasedApprovalHandler`:
```python
class RiskBasedApprovalHandler:
def __init__(self, auto_approve_below: str = "medium"):
self.threshold = {"low": 0, "medium": 1, "high": 2, "critical": 3}
self.auto_approve_threshold = self.threshold[auto_approve_below]
def approve(self, step_name: str, risk_level: str) -> bool:
if self.threshold.get(risk_level, 3) <= self.auto_approve_threshold:
return True # Auto-approve
return self._request_human_approval(step_name, risk_level)
```
4. **Transpiler**: Generate risk-level check before each step execution
**Tests**: ~15
---
### 8d: `reflect` Construct — Self-Evaluation Loops
**What**: Language-level support for agents evaluating and improving their own outputs.
**Syntax**:
```aegis
reflect on draft_response:
max_iterations: 3
criteria: quality_score > 0.8
improve:
let feedback = evaluate(draft_response)
let draft_response = revise(draft_response, feedback)
```
**Implementation**:
1. **Lexer**: `REFLECT` token
2. **Parser**: `ReflectBlock` AST node with `target`, `max_iterations`, `criteria`, `improve` body
3. **Transpiler**: Generate a while loop with iteration counter, criteria check, and improve body. Emit audit events for each iteration.
4. **Analyzer**: Validate criteria expression, warn on unbounded reflection (no max_iterations)
**Tests**: ~15
---
### 8e: `budget` Construct — Language-Level Cost Accounting
**What**: Track and enforce cumulative cost across tool calls, LLM invocations, and delegations.
**Syntax**:
```aegis
budget session_costs:
max_total: 5.00
alert_at: 4.00
on_exceeded: graceful_shutdown()
# In tool_call:
tool_call analyze(data: str) -> dict:
provider: "openai"
cost: 0.01 # per invocation
budget: session_costs
```
**Implementation**:
1. **Runtime**: `BudgetTracker` class that accumulates costs, checks limits, triggers alerts
2. **Transpiler**: Track cost after each tool_call, check budget before invocation
3. **Audit integration**: Budget events (allocation, spend, alert, exceeded) in audit trail
**Tests**: ~10
---
### Phase 8 Completion Criteria
- [ ] `reason` block compiles, executes, and appears in audit trail
- [ ] Semantic memory with `search()` works end-to-end
- [ ] Risk-based plan approval auto-approves low-risk, prompts for high-risk
- [ ] `reflect` construct iterates and stops at criteria or max_iterations
- [ ] `budget` tracks costs and enforces limits
- [ ] 75+ new tests across all constructs
- [ ] Examples for each new construct
- [ ] SPEC.md and CLAUDE.md updated
---
## Phase 9: Production Readiness
**Priority**: MEDIUM — necessary for real-world adoption
**Effort**: High (~4-5 sessions)
**Impact**: High — moves Aegis from demo to production quality
**Depends on**: Phase 6 (hardened security), Phase 7 (MCP for tool ecosystem)
### 9a: Persistent Audit Ledger
**What**: Pluggable audit storage backends beyond `InMemoryLedger`.
**Backends to implement**:
1. **SQLiteLedger** — append-only SQLite database:
```python
class SQLiteLedger(EventLedger):
def __init__(self, db_path: str):
self.conn = sqlite3.connect(db_path)
self._create_tables()
def append(self, event: EventRecord) -> None:
self.conn.execute(
"INSERT INTO events (event_id, timestamp, event_type, ..., self_hash) VALUES (?, ?, ...)",
(event.event_id, event.timestamp, event.event_type, ..., event.self_hash)
)
self.conn.commit()
```
2. **FileLedger** — append-only JSONL file (one event per line):
```python
class FileLedger(EventLedger):
def __init__(self, path: str):
self.path = path
def append(self, event: EventRecord) -> None:
with open(self.path, 'a') as f:
f.write(json.dumps(event.to_dict()) + '\n')
```
3. **CLI integration**: `aegis run --audit-store sqlite:audit.db examples/plan_demo.aegis`
**Files to create**:
- `aegis/runtime/ledgers.py` (SQLiteLedger, FileLedger)
**Files to modify**:
- `aegis_cli.py` (--audit-store flag)
- `aegis/runtime/audit.py` (factory function for ledger backends)
**Tests**: ~15
---
### 9b: OpenTelemetry Export
**What**: Export audit trail as OpenTelemetry-compatible traces for integration with Jaeger, Datadog, Grafana, etc.
**Implementation**:
1. **OTLP converter** in `aegis/runtime/otel.py`:
```python
def audit_to_otlp(events: list[EventRecord]) -> dict:
"""Convert Aegis audit events to OTLP trace format."""
spans = []
for event in events:
span = {
"traceId": ...,
"spanId": event.event_id,
"parentSpanId": event.parent_event,
"name": f"{event.event_type}:{event.function}",
"startTimeUnixNano": ...,
"endTimeUnixNano": ...,
"attributes": [
{"key": "aegis.module", "value": {"stringValue": event.module}},
{"key": "aegis.status", "value": {"stringValue": event.status}},
...
]
}
spans.append(span)
return {"resourceSpans": [{"scopeSpans": [{"spans": spans}]}]}
```
2. **CLI command**: `aegis audit export-otlp examples/plan_demo.aegis`
**Files to create**:
- `aegis/runtime/otel.py`
**Tests**: ~10
---
### 9c: Async/Await Support
**What**: Non-blocking tool calls, concurrent plan steps, streaming.
This is the highest-effort item. It touches every layer of the pipeline.
**Syntax**:
```aegis
async func fetch_data(url: str) -> str:
let response = await http_get(url)
return response
plan parallel_analysis(data: list):
parallel_steps: [step_a, step_b] # Run concurrently
step step_a:
let result_a = await analyze_syntax(data)
step step_b:
let result_b = await analyze_semantics(data)
step combine:
let final = merge(result_a, result_b)
```
**Implementation across pipeline**:
1. **Lexer**: `ASYNC`, `AWAIT` tokens
2. **Parser**: `AsyncFuncDef`, `AwaitExpr` AST nodes. `parallel_steps` in plan declarations.
3. **Transpiler**: `async func` -> `async def`, `await expr` -> `await expr`. Parallel steps use `asyncio.gather()`.
4. **Runtime**: Async versions of `invoke_tool`, `request_human_input`. `AsyncPlanExecutor`.
5. **CLI**: `cmd_run` uses `asyncio.run()` for async programs.
**This is a major refactor** — estimate 200+ lines of new code, modifications across all pipeline files.
**Tests**: ~30
---
### 9d: Better Error Messages
**What**: Source-mapped errors with context snippets showing the original Aegis code.
**Current behavior**:
```
Runtime error at line 15 of example.aegis:
TaintError: Cannot convert tainted value to string directly.
```
**Target behavior**:
```
Runtime error at line 15 of example.aegis:
14 | let name = tainted(user_input)
> 15 | let msg = f"Hello, {name}"
16 | return msg
TaintError: Cannot convert tainted value to string directly.
Use sanitize(name, context=html) to unwrap safely.
```
**Implementation**:
1. **Store source alongside source map** — keep original `.aegis` source lines in memory during execution
2. **Enhanced error handler** in CLI — on exception, look up source line, show 3-line context window
3. **Improved error messages** — each runtime error type should suggest the fix
**Files to modify**:
- `aegis_cli.py` (error display with context)
- `aegis/runtime/taint.py` (more helpful TaintError messages)
- `aegis/runtime/capabilities.py` (suggest which capability to add)
**Tests**: ~10
---
### 9e: LSP Server
**What**: Language Server Protocol implementation for editor integration (VS Code, Neovim, etc.).
**Implementation**:
1. **Create `aegis_lsp.py`** — LSP server using `pygls`:
- Syntax diagnostics (run analyzer, report warnings/errors)
- Go-to-definition (for functions, modules)
- Hover information (show types, capabilities, contracts)
- Completion (keywords, builtins, module members)
2. **VS Code extension** — `aegis-vscode/`:
- `.aegis` file association
- Syntax highlighting (TextMate grammar)
- LSP client configuration
**This is a separate project** — can be started in parallel. Not blocking other phases.
**Files to create**:
- `aegis_lsp.py` (or `aegis/lsp/server.py`)
- `aegis-vscode/` directory with extension manifest
---
### 9f: Package System
**What**: `aegis install <package>` for shared Aegis modules.
**Implementation**:
1. **Package manifest** — `aegis.toml` in package root:
```toml
[package]
name = "aegis-web-tools"
version = "0.1.0"
capabilities = ["network.https"]
[dependencies]
aegis-core = ">=1.0"
```
2. **Registry** — simple file-based or git-based package registry
3. **Resolver** — dependency resolution with capability checking
4. **CLI**: `aegis install`, `aegis publish`, `aegis init`
**This is a large feature** — defer detailed planning until Phases 6-8 are complete.
---
### Phase 9 Completion Criteria
- [ ] SQLiteLedger and FileLedger working with `--audit-store` flag
- [ ] OTLP export produces valid OpenTelemetry JSON
- [ ] Async/await compiles and executes for tool_call and plan
- [ ] Error messages show source context snippets
- [ ] LSP server provides diagnostics and hover (stretch goal)
- [ ] Package manifest format defined (stretch goal)
---
## Phase 10: Ecosystem & Standards
**Priority**: LONG-TERM — strategic positioning
**Effort**: Very High (~5+ sessions)
**Impact**: Strategic — positions Aegis as industry standard
**Depends on**: Phases 6-9
### 10a: A2A Protocol Support
**What**: Implement Google's Agent-to-Agent protocol for remote delegation.
**Syntax**:
```aegis
delegate research to "https://research-agent.example.com":
protocol: a2a
capabilities: [network.https]
timeout: 5m
authentication: bearer_token
```
**Implementation**:
1. **Agent Card discovery** — resolve remote agent capabilities via `.well-known/agent.json`
2. **Task negotiation** — A2A task lifecycle (submitted, working, completed, failed)
3. **Capability verification** — verify remote agent's declared capabilities match delegation requirements
4. **Secure transport** — TLS + authentication tokens
5. **Audit integration** — remote delegation events with full provenance
### 10b: EU AI Act Compliance Toolkit
**What**: Generate conformity assessment reports from Aegis audit trails.
**Implementation**:
1. **Compliance report generator**: `aegis compliance-report --standard eu-ai-act output.pdf`
2. **Coverage mapping** — map Aegis features to EU AI Act articles:
- Article 9 (Risk Management) -> capability system, contracts
- Article 10 (Data Governance) -> taint tracking, sanitization
- Article 12 (Record Keeping) -> hash-chained audit trail
- Article 13 (Transparency) -> @intent, @audit, reason blocks
- Article 14 (Human Oversight) -> human_input, plan approval
- Article 15 (Accuracy/Robustness) -> verify, contracts
3. **Crypto-shredding** — for GDPR erasure compliance:
- Encrypt audit event data with per-record keys
- Delete the key to "shred" the record while preserving hash chain structure
- Chain hashes computed over encrypted data (chain survives shredding)
### 10c: Formal Verification
**What**: Property-based testing and model checking for security invariants.
**Implementation**:
1. **Property-based tests** using Hypothesis:
```python
@given(st.text())
def test_taint_never_leaks(value):
t = Tainted(value)
# No sequence of operations should produce an untainted string
# from a tainted input without sanitize()
...
```
2. **Security invariants to verify**:
- Taint invariant: No code path from `Tainted(x)` to `str(x)` without `sanitize()`
- Capability invariant: No operation outside declared capabilities succeeds
- Audit invariant: Hash chain is always consistent after any sequence of events
- Delegation invariant: Delegated agent capabilities are always a subset of parent
3. **Model checking** — potentially use TLA+ or Alloy to formally model the security properties
### 10d: Inter-Agent Authentication
**What**: Cryptographic authentication for agent-to-agent communication.
**Implementation**:
1. **Agent identity** — each agent has a keypair (Ed25519)
2. **Message signing** — delegation messages signed by sender, verified by receiver
3. **Replay protection** — nonce + timestamp in signed messages
4. **Trust model** — capability-based trust (agent trusts another based on declared capabilities, not identity)
### 10e: Circuit Breakers & Rate Limiting
**What**: Prevent cascading failures in multi-agent systems.
**Syntax**:
```aegis
tool_call external_api(query: str) -> dict:
provider: "weather"
circuit_breaker:
failure_threshold: 5
reset_timeout: 60s
rate_limit: 10/m
```
**Implementation**:
1. **Circuit breaker** — track failure count per tool, open circuit after threshold, half-open after timeout
2. **Rate limiter** — token bucket per tool/agent, configurable rate
3. **Audit integration** — circuit breaker state changes logged
---
### Phase 10 Completion Criteria
- [ ] A2A remote delegation works with at least one real remote agent
- [ ] EU AI Act compliance report generates valid document
- [ ] Crypto-shredding preserves hash chain integrity
- [ ] Property-based tests cover all security invariants
- [ ] Agent authentication with Ed25519 signing
- [ ] Circuit breakers prevent cascading tool failures
---
## Summary Timeline
```
Phase 6: Security Hardening [URGENT] ~3-4 sessions
Phase 7: MCP Integration [HIGH] ~2-3 sessions
Phase 8: Advanced AI Constructs [MEDIUM] ~3-4 sessions
Phase 9: Production Readiness [MEDIUM] ~4-5 sessions
Phase 10: Ecosystem & Standards [LONG-TERM] ~5+ sessions
```
Phase 6 should start immediately. Phases 7 and 8 can be interleaved. Phase 9 can start in parallel (9e LSP is independent). Phase 10 is aspirational and should be planned based on user feedback.
---
## Phase 11: Security Vulnerability Fixes (Pre-Launch)
**Priority**: CRITICAL — must be completed before any public release
**Effort**: 1 session (~2-3 hours)
**Impact**: Fixes 5 confirmed vulnerabilities from security audit
**Depends on**: Nothing (can start immediately)
### Goal
Close the implementation gaps identified in the comprehensive security audit. These are not architectural issues — they're missing operator protections and sandbox edge cases.
### 11a: Taint Operator Gaps (CRITICAL)
**Problem**: `Tainted` class blocks `__str__`, `__format__`, `__bytes__`, `__iter__` but misses `__mod__` (Python's `%` string formatting operator). This allows `"SELECT %s" % tainted_val` to bypass taint protection.
**Implementation**:
1. Add `__mod__` and `__rmod__` to `aegis/runtime/taint.py` — both raise `TaintError`
2. Add `__rmul__` for consistent taint propagation when operand order is reversed
3. Add tests for all string formatting attack vectors: `%s`, `%r`, `%d` with tainted values
**Files**: `aegis/runtime/taint.py`, `tests/test_phase6_taint.py`
**Tests**: 8-10 new tests
### 11b: Relative Import Bypass (CRITICAL)
**Problem**: `sandbox.py:131` allows `level > 0` imports without checking the blocklist. `from . import os` completely bypasses the import whitelist.
**Implementation**:
1. In `_aegis_import()`, check blocklist for ALL imports regardless of level
2. Only allow relative imports for modules within the Aegis project directory
3. Add tests: relative import of blocked modules (os, subprocess, sys, etc.)
**Files**: `aegis/runtime/sandbox.py`, `tests/test_phase6_sandbox.py`
**Tests**: 6-8 new tests
### 11c: Tainted Value Direct Access (HIGH)
**Problem**: `tainted._value` is accessible via single-underscore convention. Python doesn't enforce this as private.
**Implementation**:
1. Rename `_value` to `__value` in `Tainted` class (triggers Python name mangling → `_Tainted__value`)
2. Update all internal references: `sanitize()`, `snapshot_value()`, `__repr__()`, arithmetic ops, `__getitem__`, provenance
3. Verify `getattr(tainted, '_value')` raises `AttributeError`
4. Verify `getattr(tainted, '__value')` raises `AttributeError`
5. Internal code accesses via `self.__value` (which works inside the class)
**Files**: `aegis/runtime/taint.py`, tests
**Tests**: 5-6 new tests
**Risk**: High refactor surface — many internal references to update
### 11d: Sandbox Builtin Hardening (HIGH)
**Problem**: `type()` and `object` in allowed builtins enable MRO traversal via `type("").__bases__[0].__subclasses__()`.
**Implementation**:
1. Create safe wrappers: `_safe_type()` that only returns the type name (not the type object for introspection)
2. Remove raw `type` and `object` from `ALLOWED_BUILTINS`
3. Add `_safe_type` to the restricted builtins dict
4. Test: `type("").__bases__` should fail, `type(42)` should return `<class 'int'>` (informational only)
**Files**: `aegis/runtime/sandbox.py`, `tests/test_phase6_sandbox.py`
**Tests**: 4-6 new tests
### 11e: Audit Event Spoofing Prevention (MEDIUM)
**Problem**: User code can call `emit_event(..., _internal=True)` to emit fake "internal" audit events.
**Implementation**:
1. Replace `_internal=True` boolean with a secret token generated at runtime
2. `_AUDIT_SECRET = secrets.token_hex(16)` generated once at import time
3. `emit_event(..., _auth=token)` checks token matches `_AUDIT_SECRET`
4. Transpiler injects `_audit._AUDIT_SECRET` in generated code (accessible because transpiler controls the namespace)
5. Alternative: remove `_internal` entirely and use a separate `_emit_internal_event()` that's not exported
**Files**: `aegis/runtime/audit.py`, `aegis/transpiler.py`
**Tests**: 3-4 new tests
### Completion Criteria
- [x] `"SELECT %s" % tainted_val` raises TaintError
- [x] `from . import os` raises ImportError in sandbox
- [x] `tainted._value` raises AttributeError
- [x] `type("").__bases__` is inaccessible (type() returns string name)
- [x] User code cannot emit fake internal audit events
- [x] All existing 1678 tests still pass (1678/1678 green)
- [x] New tests cover all attack vectors (34 new tests in test_phase11_security.py)
---
## Phase 12: Open Source Packaging & Infrastructure
**Priority**: HIGH — required for public release
**Effort**: 1 session (~2-3 hours)
**Impact**: Makes project installable, testable, and contributor-friendly
**Depends on**: Phase 11 (security fixes)
### 12a: Project Packaging (pyproject.toml)
**Implementation**:
1. Create `pyproject.toml` with:
- Package name: `aegis-lang`
- Version: `0.1.0`
- Python requirement: `>=3.11`
- Zero runtime dependencies
- Optional deps: `z3-solver` (smt), `pytest` + `pytest-cov` (dev)
- Entry point: `aegis = "aegis_cli:main"`
2. Refactor `aegis_cli.py` — extract `if __name__ == "__main__"` into `main()` function
3. Create `MANIFEST.in` — include README, LICENSE, SPEC.md, DOCS/, examples/
4. Verify `pip install -e .` works
5. Verify `aegis run examples/hello.aegis` works after install
**Files**: `pyproject.toml`, `MANIFEST.in`, `aegis_cli.py`
### 12b: CI/CD (GitHub Actions)
**Implementation**:
1. Create `.github/workflows/tests.yml`:
- Matrix: ubuntu-latest, macos-latest, windows-latest × Python 3.11, 3.12
- Steps: checkout, setup-python, pip install -e .[dev], pytest with coverage
- Exclude LLM tests (require API keys)
2. Create `.github/workflows/lint.yml`:
- ruff check + ruff format --check
3. Add badge to README.md
**Files**: `.github/workflows/tests.yml`, `.github/workflows/lint.yml`
### 12c: Contributor Documentation
**Implementation**:
1. Create `CONTRIBUTING.md`:
- Development setup (clone, install, test)
- Architecture overview (pipeline diagram)
- How to add a language feature (tokens → AST → parser → analyzer → transpiler → runtime → tests)
- Testing requirements (every feature needs 10+ tests)
- Code style (ruff config)
- PR process
2. Create `CODE_OF_CONDUCT.md` (Contributor Covenant v2.1)
3. Create `.github/SECURITY.md` — vulnerability reporting process
4. Create `.github/ISSUE_TEMPLATE/bug_report.md`
5. Create `.github/ISSUE_TEMPLATE/feature_request.md`
6. Create `.github/pull_request_template.md`
### 12d: README & Documentation Refresh
**Implementation**:
1. Update README.md:
- Fix test count (189 → 1,700+)
- Add installation instructions (`pip install aegis-lang`)
- Add all CLI commands (currently lists 6, actually 20+)
- Add badges (license, CI, Python version)
- Add links to DOCS/ folder
2. Create `CHANGELOG.md` with v0.1.0 entry
3. Update .gitignore: add `.env`, `.env.*`, `.vscode/`, `.idea/`, `*.sqlite`, `audit*.db`
### 12e: Git History Audit & Repo Rename
**Implementation**:
1. Audit git log for sensitive data (Windows paths, personal info)
2. Clean if necessary (git filter-branch or BFG)
3. Rename project directory from `TestL` to `aegis-lang`
4. Update all internal references to new directory name
5. Prepare GitHub repository (create repo, set description, topics)
### Completion Criteria
- [ ] `pip install aegis-lang` works from PyPI (or `pip install -e .` locally)
- [ ] `aegis run examples/hello.aegis` works after pip install
- [ ] GitHub Actions CI passes on 3 OS × 2 Python versions
- [ ] ruff linting passes (or has baseline config)
- [ ] CONTRIBUTING.md, CODE_OF_CONDUCT.md, SECURITY.md exist
- [ ] README.md is accurate and complete
- [ ] CHANGELOG.md exists with v0.1.0
- [ ] .gitignore covers all artifacts
- [ ] No sensitive data in git history
---
## Phase 13: Launch Preparation
**Priority**: HIGH — final polish before public announcement
**Effort**: 1-2 sessions
**Impact**: First impressions matter for open source adoption
**Depends on**: Phase 12
### 13a: VS Code Extension (Recommended)
**Implementation**:
1. Create `vscode-aegis/` extension directory
2. Language configuration: `.aegis` file association, syntax highlighting (TextMate grammar)
3. LSP client configuration (connects to `aegis lsp`)
4. Package as `.vsix` for VS Code Marketplace
5. Include installation instructions in README
### 13b: Launch Content
**Implementation**:
1. Write launch blog post: "Introducing Aegis — The Security-First Language for AI Agents"
- Problem statement (why existing frameworks fail at security)
- Key features with code examples
- Comparison table vs LangChain/CrewAI/AutoGen
- EU AI Act timing
- Getting started in 5 minutes
2. Create a 2-minute demo video or GIF showing:
- Writing an .aegis agent
- Taint tracking catching an injection
- Capability system blocking unauthorized access
- Audit trail verification
3. Prepare social media posts for:
- Hacker News ("Show HN: Aegis — a security-first language for AI agents")
- Reddit (r/programming, r/MachineLearning, r/artificial)
- Twitter/X
- AI safety mailing lists
### 13c: Demo Applications
**Implementation**:
1. Polish the SOC agent demo (apps/soc_agent/) — make it runnable out of the box
2. Create a "5-minute quickstart" tutorial agent
3. Create a HIPAA-compliant medical record agent demo
4. Create a financial compliance demo
5. Ensure all demos work with `pip install aegis-lang && aegis run demo.aegis`
### Completion Criteria
- [ ] VS Code extension installable and working
- [ ] Launch blog post written and reviewed
- [ ] At least 3 polished demo applications
- [ ] All demos work out of the box after pip install
- [ ] Social media posts drafted
---
## Updated Summary Timeline
```
Phase 6-10: Core Development [COMPLETE] 24 sessions
Phase 11: Security Vulnerability Fixes [CRITICAL] ~1 session
Phase 12: Open Source Packaging [HIGH] ~1 session
Phase 13: Launch Preparation [HIGH] ~1-2 sessions
─────────────
Total: ~3-4 sessions to launch
```
Phase 11 must be completed first (security fixes before public code). Phase 12 can begin immediately after. Phase 13 can overlap with Phase 12 (blog post writing is independent of packaging).
An AI client and API for WordPress to communicate with any generative AI models of various capabilities using a uniform API. Built on top of the [PHP AI Client](https://github.com/WordPress/php-ai-client), it provides a WordPress-native Prompt Builder, an Admin Settings Screen for credentials, automatic credential wiring, a PSR-compliant HTTP client, and a client-side JavaScript API.
> This file provides instructions for AI agents that read AGENTS.md (GitHub Copilot, Cursor, Windsurf, Cline, Aider, OpenCode, and others).
This document collects ideas and instructions for implementing future improvements. Follow these when adding features or refactoring the code.
> This file must stay **in sync** with `CLAUDE.md`. Whenever you change one, mirror the same change in the other so both tools continue to work correctly.