Aegis Language — Detailed Phase Plans — .md Directory

# Aegis Language — Detailed Phase Plans **Created**: 2026-03-06 **Source**: 4-agent analysis (codebase, security, competitive, AI features) **Covers**: Phase 6 through Phase 10 --- ## Phase 6: Security Hardening **Priority**: URGENT — without this, Aegis's core security claims are aspirational **Effort**: Medium (~3-4 sessions) **Impact**: Critical — fixes 7 critical + 4 high severity vulnerabilities **Depends on**: Nothing (can start immediately) ### Goal Shift Aegis from an analyzer-trusting model to a runtime-guaranteeing model. After this phase, transpiled Python code is genuinely sandboxed — dangerous operations are impossible, not just flagged. ### 6a: Restricted exec() Namespace (CRITICAL — fixes C1, C4, C7, H4) **Problem**: `_seed_runtime_namespace()` in `aegis_cli.py:33-51` creates a `ConstNamespace` with guarded `open`/`print`/`input`, but does NOT restrict `__builtins__`. Python's default builtins are inherited, giving transpiled code access to `eval()`, `exec()`, `__import__()`, `globals()`, `vars()`, `compile()`, `breakpoint()`, `type()`, `getattr()` on internals, etc. **Implementation**: 1. **Create `aegis/runtime/sandbox.py`** — new module for the restricted builtins dict: ```python """Aegis sandbox — restricted builtins for transpiled code execution.""" import builtins # Allowlist of safe builtins ALLOWED_BUILTINS = { # Types & constructors 'bool', 'int', 'float', 'str', 'bytes', 'bytearray', 'list', 'tuple', 'dict', 'set', 'frozenset', 'complex', 'range', 'slice', 'object', 'type', 'memoryview', 'enumerate', 'zip', 'map', 'filter', # Numeric & math 'abs', 'divmod', 'max', 'min', 'pow', 'round', 'sum', # String & formatting 'ascii', 'bin', 'chr', 'format', 'hex', 'oct', 'ord', 'repr', # Iteration & sequences 'all', 'any', 'iter', 'len', 'next', 'reversed', 'sorted', # Object introspection (safe subset) 'callable', 'hash', 'id', 'isinstance', 'issubclass', 'hasattr', 'getattr', 'setattr', 'delattr', 'dir', 'property', 'staticmethod', 'classmethod', 'super', # Exceptions (all standard exceptions) 'Exception', 'TypeError', 'ValueError', 'KeyError', 'IndexError', 'AttributeError', 'RuntimeError', 'StopIteration', 'StopAsyncIteration', 'GeneratorExit', 'ArithmeticError', 'LookupError', 'OverflowError', 'ZeroDivisionError', 'FloatingPointError', 'AssertionError', 'ImportError', 'ModuleNotFoundError', 'NameError', 'UnboundLocalError', 'OSError', 'FileNotFoundError', 'FileExistsError', 'PermissionError', 'IsADirectoryError', 'NotADirectoryError', 'InterruptedError', 'IOError', 'EOFError', 'UnicodeError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'NotImplementedError', 'RecursionError', 'SystemError', 'Warning', 'UserWarning', 'DeprecationWarning', 'PendingDeprecationWarning', 'SyntaxWarning', 'RuntimeWarning', 'FutureWarning', 'ResourceWarning', 'UnicodeWarning', 'BytesWarning', # Constants 'True', 'False', 'None', 'Ellipsis', 'NotImplemented', '__name__', '__doc__', } # Explicitly BLOCKED builtins (the security-critical ones) BLOCKED_BUILTINS = { 'eval', 'exec', 'compile', # Code execution '__import__', # Import system 'globals', 'locals', 'vars', # Namespace introspection 'breakpoint', # Debugger access 'exit', 'quit', # Process control 'open', # Replaced by guarded_open 'print', # Replaced by guarded_print 'input', # Replaced by guarded_input } def make_restricted_builtins() -> dict: """Build a builtins dict with dangerous operations removed.""" restricted = {} for name in ALLOWED_BUILTINS: val = getattr(builtins, name, None) if val is not None: restricted[name] = val return restricted ``` 2. **Update `_seed_runtime_namespace()` in `aegis_cli.py`**: ```python from aegis.runtime.sandbox import make_restricted_builtins def _seed_runtime_namespace(path: str = "<stdin>") -> ConstNamespace: search_dir = ... cache = ModuleCache(search_paths=[search_dir]) restricted = make_restricted_builtins() restricted['print'] = guarded_print restricted['open'] = guarded_open restricted['input'] = guarded_input restricted['__import__'] = _aegis_import_hook # See 6b ns = ConstNamespace({ "__name__": "__aegis_main__", "__file__": path, "__builtins__": restricted, # KEY CHANGE "print": guarded_print, "open": guarded_open, "input": guarded_input, }) ns['aegis_import'] = _bound_aegis_import return ns ``` 3. **Update `runtime/importer.py:36`** — the `aegis_import` function also calls `exec(code, ns)`. Ensure it passes the same restricted builtins. 4. **Update harness pattern** — demo apps in `apps/*/harness.py` all call `exec(code, namespace)`. Each harness needs to use restricted builtins too. Consider extracting a shared `aegis_exec(code, namespace)` helper. **Files to modify**: - Create: `aegis/runtime/sandbox.py` - Modify: `aegis_cli.py` (lines 33-51) - Modify: `aegis/runtime/importer.py` (line 36) - Modify: All `apps/*/harness.py` files (7 files) **Tests to add** (`tests/test_phase6_sandbox.py`): - exec() namespace doesn't contain `eval`, `exec`, `compile`, `__import__` - exec() namespace doesn't contain `globals`, `locals`, `vars` - Transpiled code that tries `eval("1+1")` raises `NameError` - Transpiled code that tries `exec("x=1")` raises `NameError` - Transpiled code that tries `globals()` raises `NameError` - Transpiled code that tries `vars()` raises `NameError` - Transpiled code that tries `compile("x", "", "exec")` raises `NameError` - All existing 627 tests still pass (no regressions) **Verification**: Run the full test suite + manually try `eval()` in REPL to confirm it's blocked. --- ### 6b: Import Whitelist (CRITICAL — fixes C1, capability bypass via imports) **Problem**: Transpiled code can `import os`, `import subprocess`, `import importlib`, etc. The analyzer doesn't check imports. Even with `__import__` removed from builtins (6a), Python `import` statements work through the import machinery which doesn't go through `__builtins__['__import__']` — it uses `importlib._bootstrap._find_and_load` internally. **Key insight**: When `__builtins__` is a dict (not a module), Python's `import` statement DOES use `__builtins__['__import__']`. Since we set `__builtins__` to a dict in 6a, we can control imports by providing a custom `__import__` function. **Implementation**: 1. **Create import hook in `sandbox.py`**: ```python # Safe modules that Aegis code is allowed to import IMPORT_WHITELIST = { # Standard library (safe subset) 'math', 'statistics', 'decimal', 'fractions', 'datetime', 'time', 'calendar', 'json', 'csv', 'collections', 'itertools', 'functools', 'operator', 'string', 're', 'copy', 'enum', 'dataclasses', 'typing', 'uuid', 'hashlib', 'hmac', 'secrets', 'base64', 'binascii', 'textwrap', 'difflib', 'abc', 'numbers', 'pprint', 'logging', # Aegis runtime (transpiler emits these imports) 'aegis', 'aegis.runtime', 'aegis.runtime.taint', 'aegis.runtime.capabilities', 'aegis.runtime.contracts', 'aegis.runtime.const', 'aegis.runtime.agent', 'aegis.runtime.audit', } # Explicitly BLOCKED modules IMPORT_BLOCKLIST = { 'os', 'sys', 'subprocess', 'shutil', 'pathlib', # System access 'importlib', 'pkgutil', 'zipimport', # Import manipulation 'ctypes', 'cffi', # Native code 'socket', 'http', 'urllib', 'ftplib', 'smtplib', # Network (use tool_call) 'pickle', 'shelve', 'marshal', # Serialization (unsafe) 'code', 'codeop', 'compileall', 'py_compile', # Code compilation 'ast', # AST manipulation 'inspect', 'dis', 'traceback', # Introspection 'signal', 'multiprocessing', 'threading', # Process/thread control 'gc', 'weakref', # GC manipulation 'builtins', '_thread', # Internals } def make_import_hook(original_import=__builtins__.__import__ if isinstance(__builtins__, type(__builtins__)) else __builtins__['__import__']): """Create a restricted __import__ that only allows whitelisted modules.""" def _aegis_import(name, globals=None, locals=None, fromlist=(), level=0): # Allow relative imports (level > 0) for Aegis module system if level > 0: return original_import(name, globals, locals, fromlist, level) # Check top-level module name top_level = name.split('.')[0] if top_level in IMPORT_BLOCKLIST: raise ImportError( f"Import of '{name}' is blocked in Aegis. " f"Use tool_call for external operations." ) if name not in IMPORT_WHITELIST and top_level not in IMPORT_WHITELIST: raise ImportError( f"Import of '{name}' is not allowed in Aegis. " f"Allowed modules: {', '.join(sorted(IMPORT_WHITELIST))}" ) return original_import(name, globals, locals, fromlist, level) return _aegis_import ``` 2. **Wire into restricted builtins** (in `_seed_runtime_namespace`): ```python restricted['__import__'] = make_import_hook() ``` 3. **Add `--unrestricted` CLI flag** for development/debugging that uses full builtins (opt-in escape hatch, not default). **Files to modify**: - Modify: `aegis/runtime/sandbox.py` (add import hook) - Modify: `aegis_cli.py` (wire hook into namespace, add --unrestricted flag) - Modify: `aegis/runtime/importer.py` (use same hook for imported Aegis modules) **Tests to add**: - `import os` in transpiled code raises `ImportError` - `import subprocess` raises `ImportError` - `import importlib` raises `ImportError` - `import ctypes` raises `ImportError` - `import math` works (whitelisted) - `import json` works (whitelisted) - `import datetime` works (whitelisted) - `from aegis.runtime.taint import ...` works (whitelisted) - `import collections` works (whitelisted) - Existing tests still pass (verify aegis runtime imports work) **Edge cases to test**: - `from os import path` — blocked (top-level `os` is blocked) - `import os.path` — blocked - `importlib.import_module('os')` — blocked (importlib itself blocked) --- ### 6c: Taint Hardening (CRITICAL — fixes C2, C3, H3) **Problem**: Three taint bypass vectors exist: 1. `.raw` property is public — any code can unwrap tainted values 2. No `__bytes__`, `__iter__`, `__index__` — values leak through these operations 3. Taint lost at function boundaries — analyzer doesn't infer tainted returns **Implementation**: 1. **Make `.raw` access controlled** in `taint.py`: ```python class Tainted(Generic[T]): def __init__(self, value: T, _provenance=None): # Use name mangling to make value harder to access self.__value = value self._provenance = _provenance or TaintProvenance("direct") @property def raw(self) -> T: """Access raw value — ONLY for use by sanitize() and registered sanitizers. Raises TaintError if called from non-sanitizer context.""" import inspect frame = inspect.currentframe() caller = frame.f_back caller_name = caller.f_code.co_name if caller else "" caller_module = caller.f_globals.get('__name__', '') # Allow access from sanitize functions and aegis runtime if (caller_name in ('sanitize', '_sanitize_sql', '_sanitize_html', '_sanitize_shell', '_sanitize_path') or caller_module.startswith('aegis.runtime') or caller_name.startswith('_sanitize_')): return self.__value # Allow access from registered custom sanitizers if hasattr(self, '_allowed_callers') and caller_name in self._allowed_callers: return self.__value raise TaintError( f"Direct access to .raw is not allowed. Use sanitize() to unwrap tainted values." ) ``` **Note**: The `inspect.currentframe()` approach has performance overhead. An alternative is to use a context manager or token-based access: ```python _SANITIZER_TOKEN = object() # module-private sentinel class Tainted(Generic[T]): def unwrap(self, _token=None) -> T: if _token is not _SANITIZER_TOKEN: raise TaintError("Cannot unwrap tainted value without sanitizer token") return self.__value def sanitize(value, context='sql'): raw = value.unwrap(_token=_SANITIZER_TOKEN) ... ``` The token approach is faster and more Pythonic. Choose between the two based on how strictly we want to enforce access (inspect = stricter but slower, token = faster but importable from runtime module). **Recommended approach**: Token-based. The `_SANITIZER_TOKEN` is a module-private object in `taint.py`. Since transpiled code can't import from blocked modules and the token is `_`-prefixed (convention for private), it's effectively inaccessible. With the import whitelist from 6b, transpiled code can't `from aegis.runtime.taint import _SANITIZER_TOKEN`. 2. **Add leak-prevention dunder methods** in `taint.py`: ```python def __bytes__(self): raise TaintError("Cannot convert tainted value to bytes. Use sanitize() first.") def __iter__(self): raise TaintError("Cannot iterate over tainted value. Use sanitize() first.") def __next__(self): raise TaintError("Cannot iterate over tainted value. Use sanitize() first.") def __index__(self): raise TaintError("Cannot use tainted value as index. Use sanitize() first.") def __getitem__(self, key): # Return a new Tainted wrapping the sliced/indexed value return Tainted(self.__value[key], self._provenance.record({"action": "getitem", "key": repr(key)})) def __contains__(self, item): raise TaintError("Cannot check containment in tainted value. Use sanitize() first.") def __len__(self): # len() is safe — returns an int, not the tainted data return len(self.__value) def __bool__(self): # bool() is safe — returns True/False return bool(self.__value) def __hash__(self): raise TaintError("Cannot hash tainted value. Use sanitize() first.") # Prevent pickling def __reduce__(self): raise TaintError("Cannot pickle tainted value.") def __reduce_ex__(self, protocol): raise TaintError("Cannot pickle tainted value.") # Prevent JSON serialization def __json__(self): raise TaintError("Cannot serialize tainted value to JSON. Use sanitize() first.") ``` 3. **Enhance analyzer taint inference across function boundaries** in `analyzer.py`: - Track which functions have tainted parameters - If a function receives a `tainted[T]` parameter and the analyzer can see that the return value flows from that parameter (without sanitization), mark the function's return as tainted - Add warning: "Function 'X' receives tainted input but returns unsanitized value" - This is a static analysis enhancement, not runtime **Files to modify**: - Modify: `aegis/runtime/taint.py` (lines 60-100+ — Tainted class) - Modify: `aegis/analyzer.py` (taint flow pass — function return inference) - Modify: Tests that directly access `.raw` need updating to use `sanitize()` instead **Tests to add** (`tests/test_phase6_taint.py`): - `bytes(tainted_value)` raises `TaintError` - `for x in tainted_value` raises `TaintError` - `tainted_value[0]` returns `Tainted` (not raw value) - `pickle.dumps(tainted_value)` raises `TaintError` - `hash(tainted_value)` raises `TaintError` - `tainted_value.raw` raises `TaintError` (from non-sanitizer context) - `sanitize(tainted_value, context=sql)` still works (`.raw` accessible from sanitizer) - Custom sanitizers via `@sanitizer` still work - `len(tainted_value)` returns int (allowed) - `bool(tainted_value)` returns bool (allowed) - Existing taint tests updated and passing **Risk**: This is the highest-risk change — many existing tests may access `.raw` directly. Run full suite after each sub-change. --- ### 6d: Freeze Runtime State (HIGH — fixes C5, H2) **Problem**: Transpiled code can modify Aegis runtime globals: - `_agent._tool_provider = FakeProvider()` - `_audit._event_ledger = None` - `emit_event()` is public and can inject false events **Implementation**: 1. **Make runtime registries read-only after initialization** in `agent.py`: ```python _FROZEN = False def freeze_runtime(): """Lock runtime configuration. Called after harness setup, before exec().""" global _FROZEN _FROZEN = True def _check_frozen(operation: str): if _FROZEN: raise RuntimeError(f"Cannot {operation} after runtime is frozen") def set_tool_provider(provider): _check_frozen("set tool provider") global _tool_provider _tool_provider = provider # Same pattern for set_input_handler, set_memory_store, # set_delegation_handler, set_approval_handler ``` 2. **Freeze before exec() in CLI** (`aegis_cli.py`): ```python from aegis.runtime.agent import freeze_runtime from aegis.runtime.audit import freeze_audit def cmd_run(...): ns = _seed_runtime_namespace(path) freeze_runtime() # Lock agent config freeze_audit() # Lock audit config exec(python_code, ns) ``` 3. **Protect `emit_event()` in `audit.py`**: ```python _AUDIT_FROZEN = False def freeze_audit(): global _AUDIT_FROZEN _AUDIT_FROZEN = True def emit_event(event_type, *, _internal=False, **kwargs): """Emit an audit event. Only transpiler-injected calls set _internal=True.""" if _AUDIT_FROZEN and not _internal: raise RuntimeError("Direct emit_event() calls are not allowed. Events are auto-injected.") ... ``` Update transpiler to emit `_audit.emit_event(..., _internal=True)` instead of `_audit.emit_event(...)`. 4. **Harness pattern update**: All `apps/*/harness.py` files call `set_tool_provider()`, `set_delegation_handler()`, etc. BEFORE `exec()`. After those setup calls, call `freeze_runtime()`. **Files to modify**: - Modify: `aegis/runtime/agent.py` (add freeze mechanism to all setters) - Modify: `aegis/runtime/audit.py` (add freeze + _internal flag to emit_event) - Modify: `aegis/transpiler.py` (emit `_internal=True` on auto-injected events) - Modify: `aegis_cli.py` (call freeze before exec) - Modify: All `apps/*/harness.py` (call freeze before exec) **Tests to add**: - After `freeze_runtime()`, `set_tool_provider()` raises `RuntimeError` - After `freeze_audit()`, direct `emit_event()` raises `RuntimeError` - Transpiler-injected events (with `_internal=True`) still work after freeze - `reset_registry()` unfreezes (for test teardown) --- ### 6e: Enforce Delegation Capabilities (HIGH — fixes C6) **Problem**: `CallableDelegationHandler` in `agent.py:295-322` receives `capabilities` in the `DelegationContext` but the handler ignores them — delegated code runs with full permissions. **Implementation**: 1. **Wrap delegated callable in CapabilityContext** in `agent.py`: ```python class CallableDelegationHandler: def handle(self, context: DelegationContext) -> Any: handler = self._handlers.get(context.target) if handler is None: raise RuntimeError(f"No handler registered for '{context.target}'") # Enforce capability restrictions on delegated code if context.capabilities: from aegis.runtime.capabilities import CapabilityContext cap_ctx = CapabilityContext(allow=context.capabilities, deny=[]) with cap_ctx: return handler( task=context.task, memory_share=context.memory_share, memory_deny=context.memory_deny, timeout=context.timeout, max_cost=context.max_cost, ) else: return handler(...) ``` 2. **Enforce memory_deny** — if `context.memory_deny` is set, wrap memory access to block denied keys. **Files to modify**: - Modify: `aegis/runtime/agent.py` (CallableDelegationHandler.handle) **Tests to add**: - Delegate with `capabilities: [network.https]` — filesystem operations inside handler raise `CapabilityError` - Delegate with `memory_deny: [secrets]` — handler can't access "secrets" memory key - Existing delegation tests still pass --- ### 6f: Replace XOR Encryption (HIGH — fixes H1) **Problem**: Memory encryption in `agent.py:192-210` uses XOR cipher with deterministic key derivation. Same plaintext always produces same ciphertext. No IV, no authentication. **Implementation**: 1. **Add `cryptography` dependency** (or use `secrets` + `hashlib` for a better homebrew if avoiding deps): **Option A — Using `cryptography` library (recommended)**: ```python from cryptography.fernet import Fernet import base64, hashlib def _derive_fernet_key(scope: str, secret: str = "") -> bytes: """Derive a Fernet-compatible key from scope + optional secret.""" raw_key = hashlib.sha256(f"aegis-{secret}-{scope}".encode()).digest() return base64.urlsafe_b64encode(raw_key) def _encrypt(data: str, scope: str) -> str: f = Fernet(_derive_fernet_key(scope)) return f.encrypt(data.encode()).decode() def _decrypt(data: str, scope: str) -> str: f = Fernet(_derive_fernet_key(scope)) return f.decrypt(data.encode()).decode() ``` **Option B — No external deps (using AES via hashlib + secrets)**: ```python import secrets, hashlib, hmac, base64 def _encrypt(data: str, scope: str) -> str: key = hashlib.sha256(f"aegis-memory-{scope}".encode()).digest() iv = secrets.token_bytes(16) # Use XOR with key stream from HMAC-SHA256 (stream cipher) plaintext = data.encode() keystream = b'' counter = 0 while len(keystream) < len(plaintext): keystream += hmac.new(key, iv + counter.to_bytes(4, 'big'), 'sha256').digest() counter += 1 ciphertext = bytes(p ^ k for p, k in zip(plaintext, keystream)) mac = hmac.new(key, iv + ciphertext, 'sha256').digest()[:16] return base64.b64encode(iv + mac + ciphertext).decode() def _decrypt(data: str, scope: str) -> str: key = hashlib.sha256(f"aegis-memory-{scope}".encode()).digest() raw = base64.b64decode(data.encode()) iv, mac, ciphertext = raw[:16], raw[16:32], raw[32:] expected_mac = hmac.new(key, iv + ciphertext, 'sha256').digest()[:16] if not hmac.compare_digest(mac, expected_mac): raise ValueError("Memory integrity check failed — data may be tampered") # Decrypt keystream = b'' counter = 0 while len(keystream) < len(ciphertext): keystream += hmac.new(key, iv + counter.to_bytes(4, 'big'), 'sha256').digest() counter += 1 plaintext = bytes(c ^ k for c, k in zip(ciphertext, keystream)) return plaintext.decode() ``` **Recommendation**: Option B (no external deps) keeps Aegis dependency-free while providing real encryption with IV + MAC. Option A is stronger but adds a dependency. 2. **Replace `_xor_crypt` and `_derive_key`** in `agent.py` with the chosen implementation. 3. **Update `MemoryScope.set()` and `MemoryScope.get()`** to use new encrypt/decrypt. **Files to modify**: - Modify: `aegis/runtime/agent.py` (lines 185-210 — encryption functions) **Tests to add**: - Same plaintext encrypted twice produces DIFFERENT ciphertext (IV) - Decrypt(Encrypt(data)) == data - Tampered ciphertext raises error (integrity check) - Existing memory encryption tests updated --- ### 6g: Block globals()/vars() in Analyzer (MEDIUM — fixes H4) **Problem**: `globals()` and `vars()` allow runtime introspection that can access blocked builtins and runtime state. **Implementation**: 1. **Add to banned operations** in `capabilities.py`: ```python BANNED_OPERATIONS = frozenset({'eval', 'exec', '__import__', 'globals', 'vars', 'locals', 'compile', 'breakpoint'}) ``` 2. **Update analyzer** to flag these in `analyzer.py` (add to banned op detection pass). 3. **Already handled at runtime by 6a** — `globals` and `vars` removed from restricted builtins dict. This step adds the static analysis warning. **Files to modify**: - Modify: `aegis/runtime/capabilities.py` (line 21) - Modify: `aegis/analyzer.py` (banned op detection) **Tests to add**: - Analyzer warns on `globals()` usage - Analyzer warns on `vars()` usage - `compile()` and `breakpoint()` also warned --- ### Phase 6 Completion Criteria - [ ] All 7 critical vulnerabilities (C1-C7) are fixed - [ ] All 4 high severity issues (H1-H4) are fixed - [ ] All existing 627 non-LLM tests pass (no regressions) - [ ] New test file `tests/test_phase6_sandbox.py` with 40+ tests - [ ] New test file `tests/test_phase6_taint.py` with 20+ tests - [ ] Red team harness updated with new bypass attempt tests - [ ] Demo apps updated with freeze_runtime() calls - [ ] CLAUDE.md updated with new security architecture --- ## Phase 7: MCP Integration **Priority**: HIGH — ecosystem interoperability **Effort**: Medium (~2-3 sessions) **Impact**: Very High — instant access to thousands of tools **Depends on**: Phase 6 (security hardening should be done first) ### Goal Make Aegis agents interoperable with the MCP (Model Context Protocol) ecosystem while retaining all Aegis safety guarantees (taint tracking on MCP outputs, capability scoping on MCP tools, audit trail for MCP invocations). ### 7a: MCP Client Runtime **New file**: `aegis/runtime/mcp.py` **Implementation**: 1. **MCP client library** — communicate with MCP servers via stdio or HTTP: ```python """Aegis MCP integration — secure Model Context Protocol client.""" class MCPServer: """Represents a connected MCP server.""" def __init__(self, name: str, command: list[str] = None, url: str = None): self.name = name self.tools: dict[str, MCPToolSchema] = {} # Connect and discover tools... def discover_tools(self) -> list[MCPToolSchema]: """List available tools from this MCP server.""" ... def invoke(self, tool_name: str, arguments: dict) -> Any: """Call an MCP tool and return the result.""" ... class MCPToolSchema: """Describes an MCP tool's input/output schema.""" name: str description: str input_schema: dict # JSON Schema # Map to Aegis capability categories inferred_capabilities: list[str] class MCPRegistry: """Global registry of MCP servers and tools.""" _servers: dict[str, MCPServer] = {} def register_server(self, server: MCPServer): ... def get_tool(self, tool_name: str) -> tuple[MCPServer, MCPToolSchema]: ... ``` 2. **Taint wrapping** — all MCP tool outputs are automatically wrapped in `Tainted`: ```python def invoke_mcp_tool(tool_name, arguments, *, taint_output=True): server, schema = _mcp_registry.get_tool(tool_name) result = server.invoke(tool_name, arguments) if taint_output: return Tainted(result, TaintProvenance(f"mcp:{server.name}/{tool_name}")) return result ``` 3. **Capability mapping** — MCP tool categories auto-map to Aegis capabilities: ```python MCP_CAPABILITY_MAP = { 'filesystem': 'filesystem', 'browser': 'network.https', 'database': 'filesystem', 'network': 'network', 'shell': 'process.spawn', } ``` ### 7b: Language Syntax — `mcp_tool` or Extended `tool_call` **Option A — New `mcp_tool` keyword**: ```aegis mcp_tool search(query: str) -> list: server: "brave-search" tool: "brave_web_search" timeout: 30s taint_output: True ``` **Option B — Extend `tool_call` with `provider: mcp`**: ```aegis tool_call search(query: str) -> list: provider: "mcp:brave-search/brave_web_search" timeout: 30s taint_output: True ``` **Recommendation**: Option B — reuse existing `tool_call` syntax. The `provider` field already exists; prefix with `mcp:` to indicate MCP source. Less new syntax to learn. **Implementation across pipeline**: 1. **Lexer**: No changes (reuses existing tool_call tokens) 2. **Parser**: No changes (reuses existing tool_call parsing) 3. **Analyzer**: Detect `provider: "mcp:..."` pattern, infer capabilities from MCP tool schema, validate MCP server is registered 4. **Transpiler**: When provider starts with `mcp:`, generate `invoke_mcp_tool()` call instead of `invoke_tool()` 5. **Runtime**: `invoke_mcp_tool()` wraps result in Tainted, checks capabilities, emits audit event ### 7c: MCP Server Configuration **Config file**: `aegis.toml` or `aegis.yaml` in project root: ```toml [mcp.servers] [mcp.servers.brave-search] command = ["npx", "@anthropic-ai/mcp-server-brave-search"] env = { BRAVE_API_KEY = "$BRAVE_API_KEY" } [mcp.servers.filesystem] command = ["npx", "@anthropic-ai/mcp-server-filesystem"] args = ["/allowed/path"] capabilities = ["filesystem.read"] # Aegis capability mapping ``` ### 7d: Audit Integration - MCP tool invocations auto-logged with event_type `mcp_tool_call` - Audit events include: server name, tool name, input arguments (with redaction), output (tainted marker), duration, capability check result - Hash-chained into existing ledger **Files to create**: - `aegis/runtime/mcp.py` (MCP client, registry, invocation) **Files to modify**: - `aegis/analyzer.py` (MCP provider detection, capability inference) - `aegis/transpiler.py` (MCP provider codegen) - `aegis/runtime/audit.py` (MCP event types) - `aegis_cli.py` (load MCP config on startup) **Tests to add** (`tests/test_phase7_mcp.py`): - MCP tool output is automatically Tainted - MCP tool call requires matching capability - MCP tool call emitted in audit trail - MCP tool with timeout works - MCP tool with retry works - Invalid MCP server raises clear error - ~25 tests ### Phase 7 Completion Criteria - [ ] MCP client runtime with server discovery and tool invocation - [ ] `tool_call` with `provider: "mcp:server/tool"` compiles and runs - [ ] MCP outputs automatically tainted - [ ] MCP tools checked against capability system - [ ] MCP invocations logged in audit trail - [ ] Config file loading for MCP servers - [ ] 25+ new tests - [ ] Example: `examples/mcp_demo.aegis` with MCP tool usage --- ## Phase 8: Advanced AI Constructs **Priority**: MEDIUM — differentiation features **Effort**: Medium-High (~3-4 sessions) **Impact**: High — unique capabilities no other framework has **Depends on**: Phase 6 (some constructs need hardened runtime) ### 8a: `reason` Block — Structured Reasoning with Audit **What**: A new language construct that captures structured reasoning into the hash-chained audit trail. This is uniquely valuable — no other framework makes AI reasoning tamper-evident. **Syntax**: ```aegis reason "Determine threat severity": let indicators = extract_indicators(alert) let count = len(indicators) if count > 5: let conclusion = "critical" elif count > 2: let conclusion = "high" else: let conclusion = "low" # Block result is the last expression or explicit 'conclude' statement conclude conclusion ``` **Implementation across pipeline**: 1. **Lexer** — new tokens: `REASON`, `CONCLUDE` 2. **Parser** — `ReasonBlock` AST node: ```python @dataclass class ReasonBlock(Statement): description: str # "Determine threat severity" body: list[Statement] # Block body conclusion: Expression # The conclude expression line: int column: int ``` 3. **Analyzer** — validate `conclude` appears in reason block, check taint flow through reasoning 4. **Transpiler** — generates: ```python # reason "Determine threat severity" _reason_scope = _audit.EventScope("reason", {"description": "Determine threat severity"}) with _reason_scope: indicators = extract_indicators(alert) count = len(indicators) if count > 5: conclusion = "critical" elif count > 2: conclusion = "high" else: conclusion = "low" _reason_result = conclusion _audit.emit_event("reasoning_complete", _internal=True, description="Determine threat severity", inputs={"alert": _audit.snapshot_value(alert)}, outputs={"conclusion": _audit.snapshot_value(_reason_result)}, status="ok") ``` 5. **Runtime** — reasoning events stored in audit trail with full input/output snapshots, parent-child scoping, hash chaining **Files to create/modify**: - Modify: `aegis/tokens.py` (REASON, CONCLUDE tokens) - Modify: `aegis/ast_nodes.py` (ReasonBlock node) - Modify: `aegis/lexer.py` (tokenize reason, conclude) - Modify: `aegis/parser.py` (parse reason block) - Modify: `aegis/analyzer.py` (validate reason blocks) - Modify: `aegis/transpiler.py` (transpile reason blocks) **Tests**: ~20 (lexer + parser + analyzer + transpiler + runtime integration) --- ### 8b: Semantic Memory — Vector-Backed Search **What**: Extend `memory_access` with a `type: semantic` option for vector-backed similarity search, while keeping Aegis's encryption and access controls. **Syntax**: ```aegis memory_access KnowledgeBase: scope: "domain_knowledge" type: semantic read: [policies, procedures, guidelines] write: [policies] encrypt: True embedding_model: "default" # Usage in code: KnowledgeBase.store("refund_policy", "Full refund within 30 days...") let results = KnowledgeBase.search("refund for damaged item", top_k=5) ``` **Implementation**: 1. **New memory type enum** in `agent.py`: ```python class MemoryType(Enum): KV = "kv" # Current key-value (default) SEMANTIC = "semantic" # Vector-backed similarity search EPISODIC = "episodic" # Time-ordered event memory ``` 2. **SemanticMemoryStore protocol** in `agent.py`: ```python @runtime_checkable class SemanticMemoryStore(Protocol): def store(self, key: str, text: str, metadata: dict = None) -> None: ... def search(self, query: str, top_k: int = 5) -> list[dict]: ... def delete(self, key: str) -> None: ... ``` 3. **InMemorySemanticStore** — reference implementation using simple TF-IDF or cosine similarity (no external deps). Production users plug in real vector DBs. 4. **Parser**: Recognize `type: semantic` in memory_access declarative block 5. **Transpiler**: Generate `SemanticMemoryScope` class with `store()` and `search()` methods 6. **Analyzer**: Validate semantic-specific properties **Files to modify**: - Modify: `aegis/runtime/agent.py` (SemanticMemoryStore, MemoryType) - Modify: `aegis/parser.py` (parse `type:` in memory_access) - Modify: `aegis/transpiler.py` (generate semantic memory class) - Modify: `aegis/analyzer.py` (validate semantic memory properties) **Tests**: ~15 --- ### 8c: Risk-Based Approval for Plans **What**: Replace all-or-nothing `require_approval: True` with risk-based policies. **Syntax**: ```aegis plan deploy(version: str): intent: "Deploy to production" approval_policy: risk_based step run_tests: @risk_level(low) run_test_suite(version) verify test_results_pass() step deploy_staging: @risk_level(medium) deploy_to(version, "staging") verify health_check("staging") step deploy_production: @risk_level(critical) deploy_to(version, "production") verify health_check("production") rollback: rollback_deployment("production") ``` **Implementation**: 1. **New tokens**: `APPROVAL_POLICY`, `RISK_LEVEL` (or reuse decorator syntax) 2. **Parser**: Parse `approval_policy:` in plan declarative block. Parse `@risk_level(...)` as step decorator. 3. **Runtime**: `RiskBasedApprovalHandler`: ```python class RiskBasedApprovalHandler: def __init__(self, auto_approve_below: str = "medium"): self.threshold = {"low": 0, "medium": 1, "high": 2, "critical": 3} self.auto_approve_threshold = self.threshold[auto_approve_below] def approve(self, step_name: str, risk_level: str) -> bool: if self.threshold.get(risk_level, 3) <= self.auto_approve_threshold: return True # Auto-approve return self._request_human_approval(step_name, risk_level) ``` 4. **Transpiler**: Generate risk-level check before each step execution **Tests**: ~15 --- ### 8d: `reflect` Construct — Self-Evaluation Loops **What**: Language-level support for agents evaluating and improving their own outputs. **Syntax**: ```aegis reflect on draft_response: max_iterations: 3 criteria: quality_score > 0.8 improve: let feedback = evaluate(draft_response) let draft_response = revise(draft_response, feedback) ``` **Implementation**: 1. **Lexer**: `REFLECT` token 2. **Parser**: `ReflectBlock` AST node with `target`, `max_iterations`, `criteria`, `improve` body 3. **Transpiler**: Generate a while loop with iteration counter, criteria check, and improve body. Emit audit events for each iteration. 4. **Analyzer**: Validate criteria expression, warn on unbounded reflection (no max_iterations) **Tests**: ~15 --- ### 8e: `budget` Construct — Language-Level Cost Accounting **What**: Track and enforce cumulative cost across tool calls, LLM invocations, and delegations. **Syntax**: ```aegis budget session_costs: max_total: 5.00 alert_at: 4.00 on_exceeded: graceful_shutdown() # In tool_call: tool_call analyze(data: str) -> dict: provider: "openai" cost: 0.01 # per invocation budget: session_costs ``` **Implementation**: 1. **Runtime**: `BudgetTracker` class that accumulates costs, checks limits, triggers alerts 2. **Transpiler**: Track cost after each tool_call, check budget before invocation 3. **Audit integration**: Budget events (allocation, spend, alert, exceeded) in audit trail **Tests**: ~10 --- ### Phase 8 Completion Criteria - [ ] `reason` block compiles, executes, and appears in audit trail - [ ] Semantic memory with `search()` works end-to-end - [ ] Risk-based plan approval auto-approves low-risk, prompts for high-risk - [ ] `reflect` construct iterates and stops at criteria or max_iterations - [ ] `budget` tracks costs and enforces limits - [ ] 75+ new tests across all constructs - [ ] Examples for each new construct - [ ] SPEC.md and CLAUDE.md updated --- ## Phase 9: Production Readiness **Priority**: MEDIUM — necessary for real-world adoption **Effort**: High (~4-5 sessions) **Impact**: High — moves Aegis from demo to production quality **Depends on**: Phase 6 (hardened security), Phase 7 (MCP for tool ecosystem) ### 9a: Persistent Audit Ledger **What**: Pluggable audit storage backends beyond `InMemoryLedger`. **Backends to implement**: 1. **SQLiteLedger** — append-only SQLite database: ```python class SQLiteLedger(EventLedger): def __init__(self, db_path: str): self.conn = sqlite3.connect(db_path) self._create_tables() def append(self, event: EventRecord) -> None: self.conn.execute( "INSERT INTO events (event_id, timestamp, event_type, ..., self_hash) VALUES (?, ?, ...)", (event.event_id, event.timestamp, event.event_type, ..., event.self_hash) ) self.conn.commit() ``` 2. **FileLedger** — append-only JSONL file (one event per line): ```python class FileLedger(EventLedger): def __init__(self, path: str): self.path = path def append(self, event: EventRecord) -> None: with open(self.path, 'a') as f: f.write(json.dumps(event.to_dict()) + '\n') ``` 3. **CLI integration**: `aegis run --audit-store sqlite:audit.db examples/plan_demo.aegis` **Files to create**: - `aegis/runtime/ledgers.py` (SQLiteLedger, FileLedger) **Files to modify**: - `aegis_cli.py` (--audit-store flag) - `aegis/runtime/audit.py` (factory function for ledger backends) **Tests**: ~15 --- ### 9b: OpenTelemetry Export **What**: Export audit trail as OpenTelemetry-compatible traces for integration with Jaeger, Datadog, Grafana, etc. **Implementation**: 1. **OTLP converter** in `aegis/runtime/otel.py`: ```python def audit_to_otlp(events: list[EventRecord]) -> dict: """Convert Aegis audit events to OTLP trace format.""" spans = [] for event in events: span = { "traceId": ..., "spanId": event.event_id, "parentSpanId": event.parent_event, "name": f"{event.event_type}:{event.function}", "startTimeUnixNano": ..., "endTimeUnixNano": ..., "attributes": [ {"key": "aegis.module", "value": {"stringValue": event.module}}, {"key": "aegis.status", "value": {"stringValue": event.status}}, ... ] } spans.append(span) return {"resourceSpans": [{"scopeSpans": [{"spans": spans}]}]} ``` 2. **CLI command**: `aegis audit export-otlp examples/plan_demo.aegis` **Files to create**: - `aegis/runtime/otel.py` **Tests**: ~10 --- ### 9c: Async/Await Support **What**: Non-blocking tool calls, concurrent plan steps, streaming. This is the highest-effort item. It touches every layer of the pipeline. **Syntax**: ```aegis async func fetch_data(url: str) -> str: let response = await http_get(url) return response plan parallel_analysis(data: list): parallel_steps: [step_a, step_b] # Run concurrently step step_a: let result_a = await analyze_syntax(data) step step_b: let result_b = await analyze_semantics(data) step combine: let final = merge(result_a, result_b) ``` **Implementation across pipeline**: 1. **Lexer**: `ASYNC`, `AWAIT` tokens 2. **Parser**: `AsyncFuncDef`, `AwaitExpr` AST nodes. `parallel_steps` in plan declarations. 3. **Transpiler**: `async func` -> `async def`, `await expr` -> `await expr`. Parallel steps use `asyncio.gather()`. 4. **Runtime**: Async versions of `invoke_tool`, `request_human_input`. `AsyncPlanExecutor`. 5. **CLI**: `cmd_run` uses `asyncio.run()` for async programs. **This is a major refactor** — estimate 200+ lines of new code, modifications across all pipeline files. **Tests**: ~30 --- ### 9d: Better Error Messages **What**: Source-mapped errors with context snippets showing the original Aegis code. **Current behavior**: ``` Runtime error at line 15 of example.aegis: TaintError: Cannot convert tainted value to string directly. ``` **Target behavior**: ``` Runtime error at line 15 of example.aegis: 14 | let name = tainted(user_input) > 15 | let msg = f"Hello, {name}" 16 | return msg TaintError: Cannot convert tainted value to string directly. Use sanitize(name, context=html) to unwrap safely. ``` **Implementation**: 1. **Store source alongside source map** — keep original `.aegis` source lines in memory during execution 2. **Enhanced error handler** in CLI — on exception, look up source line, show 3-line context window 3. **Improved error messages** — each runtime error type should suggest the fix **Files to modify**: - `aegis_cli.py` (error display with context) - `aegis/runtime/taint.py` (more helpful TaintError messages) - `aegis/runtime/capabilities.py` (suggest which capability to add) **Tests**: ~10 --- ### 9e: LSP Server **What**: Language Server Protocol implementation for editor integration (VS Code, Neovim, etc.). **Implementation**: 1. **Create `aegis_lsp.py`** — LSP server using `pygls`: - Syntax diagnostics (run analyzer, report warnings/errors) - Go-to-definition (for functions, modules) - Hover information (show types, capabilities, contracts) - Completion (keywords, builtins, module members) 2. **VS Code extension** — `aegis-vscode/`: - `.aegis` file association - Syntax highlighting (TextMate grammar) - LSP client configuration **This is a separate project** — can be started in parallel. Not blocking other phases. **Files to create**: - `aegis_lsp.py` (or `aegis/lsp/server.py`) - `aegis-vscode/` directory with extension manifest --- ### 9f: Package System **What**: `aegis install <package>` for shared Aegis modules. **Implementation**: 1. **Package manifest** — `aegis.toml` in package root: ```toml [package] name = "aegis-web-tools" version = "0.1.0" capabilities = ["network.https"] [dependencies] aegis-core = ">=1.0" ``` 2. **Registry** — simple file-based or git-based package registry 3. **Resolver** — dependency resolution with capability checking 4. **CLI**: `aegis install`, `aegis publish`, `aegis init` **This is a large feature** — defer detailed planning until Phases 6-8 are complete. --- ### Phase 9 Completion Criteria - [ ] SQLiteLedger and FileLedger working with `--audit-store` flag - [ ] OTLP export produces valid OpenTelemetry JSON - [ ] Async/await compiles and executes for tool_call and plan - [ ] Error messages show source context snippets - [ ] LSP server provides diagnostics and hover (stretch goal) - [ ] Package manifest format defined (stretch goal) --- ## Phase 10: Ecosystem & Standards **Priority**: LONG-TERM — strategic positioning **Effort**: Very High (~5+ sessions) **Impact**: Strategic — positions Aegis as industry standard **Depends on**: Phases 6-9 ### 10a: A2A Protocol Support **What**: Implement Google's Agent-to-Agent protocol for remote delegation. **Syntax**: ```aegis delegate research to "https://research-agent.example.com": protocol: a2a capabilities: [network.https] timeout: 5m authentication: bearer_token ``` **Implementation**: 1. **Agent Card discovery** — resolve remote agent capabilities via `.well-known/agent.json` 2. **Task negotiation** — A2A task lifecycle (submitted, working, completed, failed) 3. **Capability verification** — verify remote agent's declared capabilities match delegation requirements 4. **Secure transport** — TLS + authentication tokens 5. **Audit integration** — remote delegation events with full provenance ### 10b: EU AI Act Compliance Toolkit **What**: Generate conformity assessment reports from Aegis audit trails. **Implementation**: 1. **Compliance report generator**: `aegis compliance-report --standard eu-ai-act output.pdf` 2. **Coverage mapping** — map Aegis features to EU AI Act articles: - Article 9 (Risk Management) -> capability system, contracts - Article 10 (Data Governance) -> taint tracking, sanitization - Article 12 (Record Keeping) -> hash-chained audit trail - Article 13 (Transparency) -> @intent, @audit, reason blocks - Article 14 (Human Oversight) -> human_input, plan approval - Article 15 (Accuracy/Robustness) -> verify, contracts 3. **Crypto-shredding** — for GDPR erasure compliance: - Encrypt audit event data with per-record keys - Delete the key to "shred" the record while preserving hash chain structure - Chain hashes computed over encrypted data (chain survives shredding) ### 10c: Formal Verification **What**: Property-based testing and model checking for security invariants. **Implementation**: 1. **Property-based tests** using Hypothesis: ```python @given(st.text()) def test_taint_never_leaks(value): t = Tainted(value) # No sequence of operations should produce an untainted string # from a tainted input without sanitize() ... ``` 2. **Security invariants to verify**: - Taint invariant: No code path from `Tainted(x)` to `str(x)` without `sanitize()` - Capability invariant: No operation outside declared capabilities succeeds - Audit invariant: Hash chain is always consistent after any sequence of events - Delegation invariant: Delegated agent capabilities are always a subset of parent 3. **Model checking** — potentially use TLA+ or Alloy to formally model the security properties ### 10d: Inter-Agent Authentication **What**: Cryptographic authentication for agent-to-agent communication. **Implementation**: 1. **Agent identity** — each agent has a keypair (Ed25519) 2. **Message signing** — delegation messages signed by sender, verified by receiver 3. **Replay protection** — nonce + timestamp in signed messages 4. **Trust model** — capability-based trust (agent trusts another based on declared capabilities, not identity) ### 10e: Circuit Breakers & Rate Limiting **What**: Prevent cascading failures in multi-agent systems. **Syntax**: ```aegis tool_call external_api(query: str) -> dict: provider: "weather" circuit_breaker: failure_threshold: 5 reset_timeout: 60s rate_limit: 10/m ``` **Implementation**: 1. **Circuit breaker** — track failure count per tool, open circuit after threshold, half-open after timeout 2. **Rate limiter** — token bucket per tool/agent, configurable rate 3. **Audit integration** — circuit breaker state changes logged --- ### Phase 10 Completion Criteria - [ ] A2A remote delegation works with at least one real remote agent - [ ] EU AI Act compliance report generates valid document - [ ] Crypto-shredding preserves hash chain integrity - [ ] Property-based tests cover all security invariants - [ ] Agent authentication with Ed25519 signing - [ ] Circuit breakers prevent cascading tool failures --- ## Summary Timeline ``` Phase 6: Security Hardening [URGENT] ~3-4 sessions Phase 7: MCP Integration [HIGH] ~2-3 sessions Phase 8: Advanced AI Constructs [MEDIUM] ~3-4 sessions Phase 9: Production Readiness [MEDIUM] ~4-5 sessions Phase 10: Ecosystem & Standards [LONG-TERM] ~5+ sessions ``` Phase 6 should start immediately. Phases 7 and 8 can be interleaved. Phase 9 can start in parallel (9e LSP is independent). Phase 10 is aspirational and should be planned based on user feedback. --- ## Phase 11: Security Vulnerability Fixes (Pre-Launch) **Priority**: CRITICAL — must be completed before any public release **Effort**: 1 session (~2-3 hours) **Impact**: Fixes 5 confirmed vulnerabilities from security audit **Depends on**: Nothing (can start immediately) ### Goal Close the implementation gaps identified in the comprehensive security audit. These are not architectural issues — they're missing operator protections and sandbox edge cases. ### 11a: Taint Operator Gaps (CRITICAL) **Problem**: `Tainted` class blocks `__str__`, `__format__`, `__bytes__`, `__iter__` but misses `__mod__` (Python's `%` string formatting operator). This allows `"SELECT %s" % tainted_val` to bypass taint protection. **Implementation**: 1. Add `__mod__` and `__rmod__` to `aegis/runtime/taint.py` — both raise `TaintError` 2. Add `__rmul__` for consistent taint propagation when operand order is reversed 3. Add tests for all string formatting attack vectors: `%s`, `%r`, `%d` with tainted values **Files**: `aegis/runtime/taint.py`, `tests/test_phase6_taint.py` **Tests**: 8-10 new tests ### 11b: Relative Import Bypass (CRITICAL) **Problem**: `sandbox.py:131` allows `level > 0` imports without checking the blocklist. `from . import os` completely bypasses the import whitelist. **Implementation**: 1. In `_aegis_import()`, check blocklist for ALL imports regardless of level 2. Only allow relative imports for modules within the Aegis project directory 3. Add tests: relative import of blocked modules (os, subprocess, sys, etc.) **Files**: `aegis/runtime/sandbox.py`, `tests/test_phase6_sandbox.py` **Tests**: 6-8 new tests ### 11c: Tainted Value Direct Access (HIGH) **Problem**: `tainted._value` is accessible via single-underscore convention. Python doesn't enforce this as private. **Implementation**: 1. Rename `_value` to `__value` in `Tainted` class (triggers Python name mangling → `_Tainted__value`) 2. Update all internal references: `sanitize()`, `snapshot_value()`, `__repr__()`, arithmetic ops, `__getitem__`, provenance 3. Verify `getattr(tainted, '_value')` raises `AttributeError` 4. Verify `getattr(tainted, '__value')` raises `AttributeError` 5. Internal code accesses via `self.__value` (which works inside the class) **Files**: `aegis/runtime/taint.py`, tests **Tests**: 5-6 new tests **Risk**: High refactor surface — many internal references to update ### 11d: Sandbox Builtin Hardening (HIGH) **Problem**: `type()` and `object` in allowed builtins enable MRO traversal via `type("").__bases__[0].__subclasses__()`. **Implementation**: 1. Create safe wrappers: `_safe_type()` that only returns the type name (not the type object for introspection) 2. Remove raw `type` and `object` from `ALLOWED_BUILTINS` 3. Add `_safe_type` to the restricted builtins dict 4. Test: `type("").__bases__` should fail, `type(42)` should return `<class 'int'>` (informational only) **Files**: `aegis/runtime/sandbox.py`, `tests/test_phase6_sandbox.py` **Tests**: 4-6 new tests ### 11e: Audit Event Spoofing Prevention (MEDIUM) **Problem**: User code can call `emit_event(..., _internal=True)` to emit fake "internal" audit events. **Implementation**: 1. Replace `_internal=True` boolean with a secret token generated at runtime 2. `_AUDIT_SECRET = secrets.token_hex(16)` generated once at import time 3. `emit_event(..., _auth=token)` checks token matches `_AUDIT_SECRET` 4. Transpiler injects `_audit._AUDIT_SECRET` in generated code (accessible because transpiler controls the namespace) 5. Alternative: remove `_internal` entirely and use a separate `_emit_internal_event()` that's not exported **Files**: `aegis/runtime/audit.py`, `aegis/transpiler.py` **Tests**: 3-4 new tests ### Completion Criteria - [x] `"SELECT %s" % tainted_val` raises TaintError - [x] `from . import os` raises ImportError in sandbox - [x] `tainted._value` raises AttributeError - [x] `type("").__bases__` is inaccessible (type() returns string name) - [x] User code cannot emit fake internal audit events - [x] All existing 1678 tests still pass (1678/1678 green) - [x] New tests cover all attack vectors (34 new tests in test_phase11_security.py) --- ## Phase 12: Open Source Packaging & Infrastructure **Priority**: HIGH — required for public release **Effort**: 1 session (~2-3 hours) **Impact**: Makes project installable, testable, and contributor-friendly **Depends on**: Phase 11 (security fixes) ### 12a: Project Packaging (pyproject.toml) **Implementation**: 1. Create `pyproject.toml` with: - Package name: `aegis-lang` - Version: `0.1.0` - Python requirement: `>=3.11` - Zero runtime dependencies - Optional deps: `z3-solver` (smt), `pytest` + `pytest-cov` (dev) - Entry point: `aegis = "aegis_cli:main"` 2. Refactor `aegis_cli.py` — extract `if __name__ == "__main__"` into `main()` function 3. Create `MANIFEST.in` — include README, LICENSE, SPEC.md, DOCS/, examples/ 4. Verify `pip install -e .` works 5. Verify `aegis run examples/hello.aegis` works after install **Files**: `pyproject.toml`, `MANIFEST.in`, `aegis_cli.py` ### 12b: CI/CD (GitHub Actions) **Implementation**: 1. Create `.github/workflows/tests.yml`: - Matrix: ubuntu-latest, macos-latest, windows-latest × Python 3.11, 3.12 - Steps: checkout, setup-python, pip install -e .[dev], pytest with coverage - Exclude LLM tests (require API keys) 2. Create `.github/workflows/lint.yml`: - ruff check + ruff format --check 3. Add badge to README.md **Files**: `.github/workflows/tests.yml`, `.github/workflows/lint.yml` ### 12c: Contributor Documentation **Implementation**: 1. Create `CONTRIBUTING.md`: - Development setup (clone, install, test) - Architecture overview (pipeline diagram) - How to add a language feature (tokens → AST → parser → analyzer → transpiler → runtime → tests) - Testing requirements (every feature needs 10+ tests) - Code style (ruff config) - PR process 2. Create `CODE_OF_CONDUCT.md` (Contributor Covenant v2.1) 3. Create `.github/SECURITY.md` — vulnerability reporting process 4. Create `.github/ISSUE_TEMPLATE/bug_report.md` 5. Create `.github/ISSUE_TEMPLATE/feature_request.md` 6. Create `.github/pull_request_template.md` ### 12d: README & Documentation Refresh **Implementation**: 1. Update README.md: - Fix test count (189 → 1,700+) - Add installation instructions (`pip install aegis-lang`) - Add all CLI commands (currently lists 6, actually 20+) - Add badges (license, CI, Python version) - Add links to DOCS/ folder 2. Create `CHANGELOG.md` with v0.1.0 entry 3. Update .gitignore: add `.env`, `.env.*`, `.vscode/`, `.idea/`, `*.sqlite`, `audit*.db` ### 12e: Git History Audit & Repo Rename **Implementation**: 1. Audit git log for sensitive data (Windows paths, personal info) 2. Clean if necessary (git filter-branch or BFG) 3. Rename project directory from `TestL` to `aegis-lang` 4. Update all internal references to new directory name 5. Prepare GitHub repository (create repo, set description, topics) ### Completion Criteria - [ ] `pip install aegis-lang` works from PyPI (or `pip install -e .` locally) - [ ] `aegis run examples/hello.aegis` works after pip install - [ ] GitHub Actions CI passes on 3 OS × 2 Python versions - [ ] ruff linting passes (or has baseline config) - [ ] CONTRIBUTING.md, CODE_OF_CONDUCT.md, SECURITY.md exist - [ ] README.md is accurate and complete - [ ] CHANGELOG.md exists with v0.1.0 - [ ] .gitignore covers all artifacts - [ ] No sensitive data in git history --- ## Phase 13: Launch Preparation **Priority**: HIGH — final polish before public announcement **Effort**: 1-2 sessions **Impact**: First impressions matter for open source adoption **Depends on**: Phase 12 ### 13a: VS Code Extension (Recommended) **Implementation**: 1. Create `vscode-aegis/` extension directory 2. Language configuration: `.aegis` file association, syntax highlighting (TextMate grammar) 3. LSP client configuration (connects to `aegis lsp`) 4. Package as `.vsix` for VS Code Marketplace 5. Include installation instructions in README ### 13b: Launch Content **Implementation**: 1. Write launch blog post: "Introducing Aegis — The Security-First Language for AI Agents" - Problem statement (why existing frameworks fail at security) - Key features with code examples - Comparison table vs LangChain/CrewAI/AutoGen - EU AI Act timing - Getting started in 5 minutes 2. Create a 2-minute demo video or GIF showing: - Writing an .aegis agent - Taint tracking catching an injection - Capability system blocking unauthorized access - Audit trail verification 3. Prepare social media posts for: - Hacker News ("Show HN: Aegis — a security-first language for AI agents") - Reddit (r/programming, r/MachineLearning, r/artificial) - Twitter/X - AI safety mailing lists ### 13c: Demo Applications **Implementation**: 1. Polish the SOC agent demo (apps/soc_agent/) — make it runnable out of the box 2. Create a "5-minute quickstart" tutorial agent 3. Create a HIPAA-compliant medical record agent demo 4. Create a financial compliance demo 5. Ensure all demos work with `pip install aegis-lang && aegis run demo.aegis` ### Completion Criteria - [ ] VS Code extension installable and working - [ ] Launch blog post written and reviewed - [ ] At least 3 polished demo applications - [ ] All demos work out of the box after pip install - [ ] Social media posts drafted --- ## Updated Summary Timeline ``` Phase 6-10: Core Development [COMPLETE] 24 sessions Phase 11: Security Vulnerability Fixes [CRITICAL] ~1 session Phase 12: Open Source Packaging [HIGH] ~1 session Phase 13: Launch Preparation [HIGH] ~1-2 sessions ───────────── Total: ~3-4 sessions to launch ``` Phase 11 must be completed first (security fixes before public code). Phase 12 can begin immediately after. Phase 13 can overlap with Phase 12 (blog post writing is independent of packaging).

Aegis Language — Detailed Phase Plans

Related Documents

WordPress AI Client - Coding Agent Guide

AGENTS.md — Cross-Platform Agent Instructions

Contributor Guidelines for the `ee` editor

Light Manager Air Integration Guidelines