I built a runtime execution kernel for AI agents

While building AI agents, I kept running into the same uncomfortable question:

How do I guarantee an agent execution will stop?

Not “usually stop.” Not “log when it goes wrong.” But actually guarantee it won’t run forever, retry endlessly or burn money in a loop.

Most agent frameworks focus on reasoning quality. I was more worried about runaway execution.

That’s what led me to build AgenWatch.

What the problem actually is

The real problem with AI agents

If you’ve worked with agents, you’ve probably seen this:

Infinite reasoning loops
Silent retries
Budget overruns discovered after the damage
Tools being called repeatedly because the model “tries again”

Observability helps explain what happened. It does nothing to stop it.

I didn’t want better logs. I wanted runtime enforcement.

The idea: Treat agent execution like an operating system problem

In operating systems, we don’t trust processes to behave correctly. We enforce limits:

CPU time
Memory
Permissions

I applied the same idea to AI agents.

Instead of trusting the LLM to stop, I built a runtime execution kernel that decides:

whether a step is allowed
whether a tool can be called
whether execution must halt

That kernel became AgenWatch.

What AgenWatch is (and is not)

AgenWatch is:

A runtime execution kernel for AI agents
A bounded execution controller
A governance layer that enforces limits before execution

AgenWatch is not:

An agent framework
A prompt engineering tool
An observability dashboard
A replacement for LangChain or CrewAI

A minimal AgenWatch example

This is a basic example showing runtime budget enforcement.

import os
from agenwatch import Agent, tool
from agenwatch.providers import OpenAIProvider

@tool("Echo input text")
def echo(**kwargs) -> dict:
    text = kwargs.get("text", "")
    return {"echo": text}

agent = Agent(
    tools=[echo],
    llm=OpenAIProvider(
        api_key=os.getenv("OPENAI_API_KEY"),
        model="gpt-4o-mini"
    ),
    budget=1.0,
    max_iterations=5
)

result = agent.run("Echo hello")

print(f"Success: {result.success}")
print(f"Cost: {result.cost}")
print(f"Output: {result.output}")

If the budget or iteration limit is exceeded, the kernel blocks the next call before it executes.

Using LangChain with AgenWatch

LangChain can generate tasks and prompts. AgenWatch governs execution.

import os
from langchain_core.prompts import ChatPromptTemplate
from agenwatch import Agent, tool
from agenwatch.providers import OpenAIProvider

@tool("Echo text safely")
def echo(**kwargs) -> dict:
    return {"echo": kwargs.get("text", "")}

agent = Agent(
    tools=[echo],
    llm=OpenAIProvider(
        api_key=os.getenv("OPENAI_API_KEY"),
        model="gpt-4o-mini"
    ),
    budget=1.0,
    max_iterations=3
)

prompt = ChatPromptTemplate.from_messages([
    ("human", "Say hello using the echo tool")
])

task = prompt.format_messages()[0].content
result = agent.run(task)

print(result.success, result.cost, result.output)

LangChain handles what to do. AgenWatch enforces whether it’s allowed to continue.

What AgenWatch does NOT do (by design)

In v0.1.x, AgenWatch:

Does not persist execution state to disk
Does not resume after process crashes
Does not rollback external side effects
Does not sandbox the OS or subprocesses

If a hard limit is hit mid-execution, AgenWatch freezes and reports. Rollback is an orchestration concern, not a kernel concern.

Why I’m sharing this

I built AgenWatch because I needed hard execution guarantees, not better explanations after failure.

It’s early. It’s intentionally narrow. But it already solved a real production problem for me.

If you’re building agents and care about:

cost control
safety
deterministic stopping

you might find it useful.

GitHub: https://github.com/agenwatch/agenwatch
PyPI: https://pypi.org/project/agenwatch/

I built a runtime execution kernel for AI agents — not another framework

The real problem with AI agents

The idea: Treat agent execution like an operating system problem

What AgenWatch is (and is not)

A minimal AgenWatch example

Using LangChain with AgenWatch

What AgenWatch does NOT do (by design)

Why I’m sharing this

Tags

Comments

More Blog

Five Gemma-4 models, one accelerator: what porting E2B 31B to AWS Inferentia2 taught me

Hey DEV, I'm Tobore. Let's actually connect.

I burned through thousands of AI tokens. Then a friend did it for free

Claude might be saturating your machine

Automated GitHub Code Reviews Using Google Gemini

What is an "agentic harness," actually?

Ready-made automations for this