I Squeezed My $1k Monthly OpenClaw API Bill with ~$20/Month…

Part 1 of my series on building a low-cost personal AI stack on AWS. Part 2 — Drop-in Perplexity Sonar replacement with AWS Bedrock Nova Grounding Part 3 — From 3-Minute Cold Starts to ~20 Seconds: Whisper on AWS Lambda + EFS

I've got OpenClaw(MoltBot(ClawdBot)) running locally on a Raspberry Pi — where computation power is scarce, and it's gone unresponsive on me more than a few times. But even on constrained hardware, every chat turn, every memory search, every web lookup is hitting paid APIs. The bill is small at first, then it isn't. I had been using qwen3-coder-480b for a week or two, and the daily cost skyrocketed to as much as $50.

Assumption: OpenClaw is running on hardware you already own or pay for separately — a Raspberry Pi, home server, or existing cloud instance. The compute cost of the host itself isn't counted in the ~$20/month figure here.

If you've picked up AWS Credits from events, the AWS Community Builder program ($500/year), or AWS Activate — or if your company prefers to keep spend within AWS rather than onboarding yet another SaaS API provider — there's a way to run the whole OpenClaw stack on credits.

This is how I did it.

Disclaimer: The crux of this hack relies heavily on Amazon Q Developer Pro's undocumented while generously high usage ceiling while it lasts. If it's eventually deprecated, we will still need to switch to Kiro plans with overage pricings - still covered by AWS Credits with lower cost/token ratio.

Who This Is For

Two very different reasons to care about this setup.

If you have AWS Credits to burn: Credits from re:Invent, AWS Community Builder, AWS Activate, or customer programs come with expiry dates. Running your AI assistant stack on them is one of the most practical ways to put idle credits to work — at ~$20/month, $100 in credits covers 5 months of the full stack. If you're sitting on a few hundred dollars with an end-of-year deadline, this is a productive use before they lapse.

If you're in a company with procurement or compliance requirements: Every new SaaS vendor is a TPRM exercise. OpenAI for embeddings, Perplexity for web search, Anthropic for Claude — each one is a separate vendor assessment, a separate DPA, and a separate conversation with your security team. For FSI and regulated industries, that's not just overhead — it can be a blocker.

AWS is likely already in your vendor register. Consolidating on Bedrock means single billing, fewer third-party relationships to manage, and data residency you control. For anything touching customer data in banking, insurance, or healthcare, that's the difference between a quick internal approval and a 3-month procurement cycle.

Prerequisites

AWS account with Bedrock access enabled in us-east-1 (or another US region)
AWS credentials — a Bedrock API key is the simplest option if your account supports it. Otherwise, a long-term IAM access key/secret key pair works fine and is easier to manage than SSO. IAM Identity Center is only required for the Q Developer Pro layer.
Python 3.10+ — used by kiro-gateway, LiteLLM, and the Nova grounding proxy
Amazon Q Developer Pro subscription ($19/user/month, credit-eligible) — required for Layer 1 (kiro-gateway). Kiro Pro, Pro+, or Power plans also work but are credit-based with overage charges — Q Developer Pro is the better deal.

What Actually Costs Money in OpenClaw?

Before reaching for solutions, it helps to know exactly where the spend goes. OpenClaw has five distinct cost centers:

1. Main model (LLM) Every chat turn, every agent action, every tool call — all routed through your primary LLM. This is the biggest variable cost. On a busy day it adds up fast.

2. Memory search (embeddings) OpenClaw's memory_search tool converts your memory files into vector embeddings and queries them semantically. Every search = an embedding API call. Low cost per call, but it runs constantly in the background.

3. Web search The web_search tool hits Perplexity or Brave APIs. Perplexity charges per query on paid plans; Brave gives you $5/month free then charges beyond that.

4. Browser automation The browser tool spins up a Chromium instance for web scraping, form filling, and screenshots. Running a full browser on a low-compute machine (Raspberry Pi, t4g.small) is heavy — and cloud browser options cost per session.

5. Speech-to-text (STT) Voice messages transcribed via your STT provider. OpenAI Whisper API charges per minute of audio — self-hosting on Lambda eliminates this entirely.

That's it. Five layers. The goal: drive variable cost to zero.

My Config: All 5 Layers on AWS Credits

Here's the full picture before we go deep:

Layer	Solution	Credit
Main model	kiro-gateway → Amazon Q Developer Pro	@Jwadow
Memory search	Native Bedrock embeddings via PR #20191	@gabrielkoo
Web search	bedrock-web-search-proxy — Nova Grounding as Perplexity drop-in	@gabrielkoo
Browser	agent-browser + AgentCore provider	@pahudnet
Speech-to-text	`aws-lambda-whisper-adaptor` — Whisper on Lambda + EFS	@gabrielkoo

Three of these I built myself. Two were built by other community members. All five are open source.

Layer 1: Main Model + Image Analysis — Kiro CLI — Covered by AWS Credits

Amazon Q Developer Pro: flat-rate access to Claude

The key difference between Amazon Q Developer Pro and Kiro Pro is the billing model. Kiro Pro is credit-based — 1,000 credits/month, pay more if you exceed them. Amazon Q Developer Pro is a flat monthly subscription: $19/user/month, no per-token billing, no surprise overages.

Plan	Cost	Usage
Kiro Free	$0/mo	50 credits/month
Kiro Pro	$20/mo	1,000 credits + $0.04/credit overage
Kiro Pro+	$40/mo	2,000 credits + $0.04/credit overage
Kiro Power	$200/mo	10,000 credits + $0.04/credit overage
Amazon Q Developer Pro (legacy)	$19/user/mo	Flat-rate, not credit-capped

Note: Amazon Q Developer Pro is now a legacy plan in the Kiro ecosystem. AWS has stopped allowing new Builder ID subscriptions to Q Developer Pro — new users can only subscribe through Kiro plans. The undocumented usage limits on Q Pro are likely part of why AWS made this transition. If you're already on Q Developer Pro, you retain access and it remains the better deal for OpenClaw.

Your Q Developer Pro subscription grants access to kiro-cli. The documented quota is 10,000 inference calls/month — for a personal AI assistant, that's more than enough.

Real-world cost check: In 4 days of active OpenClaw usage after switching to kiro-gateway, I consumed ~40M input tokens and ~865K output tokens with Claude Sonnet. OpenClaw loads memory files, system prompts, and tool results into every turn — the context window fills up fast. At standard Bedrock pricing ($3/1M input, $15/1M output), that's ~$135 for 4 days, or roughly $1,000/month. Q Developer Pro covers all of it for $19/month flat.

In practice, I've been running Kiro CLI with OpenClaw daily and haven't hit any rate limits in active use. Note: the /usage command isn't available under the Q Developer Pro plan — monitor your usage via the AWS console instead. That said, after running OpenClaw with kiro-gateway for several days, I checked the Q Developer usage metrics in the AWS console and the figures hadn't moved at all. It's unclear whether Kiro CLI usage is counted against the same quota as Q Developer's agentic requests, or tracked separately. The Amazon Q Developer pricing page only states "Included (with limits)" for the Pro tier — no specifics on what those limits are or how Kiro CLI calls are metered.

Note: Q Developer Pro requires AWS IAM Identity Center (SSO) — you can't use it with a free Builder ID. If you're already set up with Identity Center (common in enterprise teams and AWS Community Builders with corporate accounts), you're good to go.

Important: Standard AWS Credits don't cover per-token Claude usage via Anthropic's marketplace agreement. But the Q Developer Pro subscription fee itself is credit-eligible — making the whole stack fundable with AWS credits. Kiro's flat-rate subscription is currently the only practical way to run Claude in OpenClaw without per-token billing.

New AWS accounts: Even if you'd prefer to pay per-token via direct Bedrock API, new accounts often come with ultra-low default rate limits that can't reliably serve OpenClaw — even when you're willing to pay. The flat-rate Q Developer Pro route sidesteps this entirely.

kiro-gateway: the bridge

kiro-gateway — built by @Jwadow — wraps Kiro CLI and exposes OpenAI-compatible and Anthropic-compatible API endpoints. OpenClaw talks to it like any other provider.

git clone https://github.com/jwadow/kiro-gateway
cd kiro-gateway
pip install -r requirements.txt
cp .env.example .env

Edit .env:

PROXY_API_KEY="your-secret-key"
KIRO_CREDS_FILE="~/.aws/sso/cache/kiro-auth-token.json"

Run kiro-cli login once to authenticate — this populates KIRO_CREDS_FILE automatically. (kiro-cli is only needed for this initial login; kiro-gateway reads the token it generates. Re-run if your token expires.) Then:

python main.py --port 9000

Heads up: kiro-gateway's hardcoded fallback model list may lag behind new Claude releases. If a model isn't showing up at /v1/models, add it manually to FALLBACK_MODELS in kiro/config.py.

Available models via Q Developer Pro:

Model	Best for
`claude-sonnet-4.6`	General tasks, coding, writing
`claude-haiku-4.5`	Fast, lightweight responses
`claude-opus-4.6`	Complex reasoning, long context

OpenClaw config:

{
  "models": {
    "providers": {
      "kiro": {
        "baseUrl": "http://localhost:9000",
        "apiKey": "your-secret-key",
        "api": "anthropic-messages"
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "kiro/claude-sonnet-4.6"
      },
      "imageModel": {
        "primary": "kiro/claude-sonnet-4.6"
      }
    }
  }
}

Bonus: kiro-gateway works with any tool that supports OpenAI or Anthropic APIs — not just OpenClaw. To use it with Claude Code: ANTHROPIC_BASE_URL=http://localhost:9000 and ANTHROPIC_API_KEY=your-secret-key.

Layer 2: Memory Search — Bedrock Embeddings — Covered by AWS Credits

OpenClaw's memory_search needs an embedding model. Amazon Nova Multimodal Embeddings costs ~$0.00014 per 1K tokens — fractions of a cent per query, and covered by AWS Credits.

OpenClaw's native Bedrock provider doesn't wire up embeddings cleanly yet — PR #24892 - (I made a novice mistake with PR #20191) is pending merge. Until then, you'll need a local OpenAI-compatible proxy in front of Bedrock. Two options:

Option A: LiteLLM

# litellm_config.yaml
model_list:
  - model_name: nova-2-multimodal-embeddings-v1.0
    litellm_params:
      model: bedrock/amazon.nova-2-multimodal-embeddings-v1:0
      aws_region_name: us-east-1

litellm_settings:
  drop_params: true
  master_key: "local-only"

pip install 'litellm[proxy]'
litellm --config litellm_config.yaml --port 4000

"memorySearch": {
  "enabled": true,
  "provider": "openai",
  "remote": { "baseUrl": "http://localhost:4000", "apiKey": "local-only" },
  "model": "nova-2-multimodal-embeddings-v1.0"
}

Option B: bedrock-access-gateway-function-url (serverless, no fixed cost)

My own fork of the original bedrock-access-gateway — deployed as a Lambda Function URL instead of ALB+Fargate, so there's no $16+/month fixed cost. Full writeup: Use Amazon Bedrock Models with OpenAI SDKs with a Serverless Proxy Endpoint.

Note: My PR #222 for Nova 2 embedding support against the original bedrock-access-gateway project has been merged — so my fork pulls from this upstream automatically via prepare_source.sh.

git clone --depth=1 https://github.com/gabrielkoo/bedrock-access-gateway-function-url
cd bedrock-access-gateway-function-url
./prepare_source.sh
sam build
sam deploy --guided

Grab the FunctionUrl output after deploy, then:

"memorySearch": {
  "enabled": true,
  "provider": "openai",
  "remote": { "baseUrl": "https://<your-function-url>.lambda-url.us-east-1.on.aws", "apiKey": "your-api-key" },
  "model": "amazon.nova-2-multimodal-embeddings-v1:0"
}

Region note: amazon.nova-2-multimodal-embeddings-v1:0 availability varies — check the Bedrock model availability page. Make sure your IAM credentials have bedrock:InvokeModel in your target region.

Once PR #24892 merges, no proxy needed — the config simplifies to:

"memorySearch": {
  "enabled": true,
  "provider": "bedrock",
  "model": "amazon.nova-2-multimodal-embeddings-v1:0",
  "region": "us-east-1"
}

Layer 3: Web Search — Nova Grounding Proxy — Covered by AWS Credits

I built bedrock-web-search-proxy — a FastAPI wrapper that makes Bedrock Nova Grounding look like the Perplexity Sonar API. No Perplexity or Brave API key needed. Runs entirely on AWS Credits.

Full writeup: Drop-in Perplexity Sonar Replacement with AWS Bedrock Nova Grounding.

Option A: Run locally

git clone https://github.com/gabrielkoo/bedrock-web-search-proxy
cd bedrock-web-search-proxy
pip install fastapi uvicorn boto3
uvicorn main:app --port 7000

Option B: Lambda Function URL (zero idle cost)

See the deployment guide in the repo — SAM-based, arm64, python3.13. Once deployed, you get a persistent HTTPS endpoint with no local process to manage.

OpenClaw config:

{
  "tools": {
    "web": {
      "search": {
        "provider": "perplexity",
        "perplexity": {
          "apiKey": "your-proxy-key",
          "baseUrl": "http://localhost:7000/v1",
          "model": "sonar-pro"
        }
      }
    }
  }
}

All US Nova CRIS (Cross-Region Inference Services) profiles support web grounding (us.amazon.nova-premier-v1:0, us.amazon.nova-pro-v1:0, etc.). Native model IDs without the us. prefix do NOT work — must use CRIS profiles. Web grounding is US regions only (us-east-1, us-east-2, us-west-2).

Layer 4: Cloud Browser — Bedrock AgentCore — Covered by AWS Credits

agent-browser by Vercel Labs, with the AgentCore provider contributed by Pahud Hsieh (@pahudnet) — PR #397.

The browser runs in AWS — no local Chromium needed. Particularly useful on low-compute instances (Pi, t4g.small) where running a local browser would be too heavy. Covered by AWS Credits.

Node.js and pnpm required. Since PR #397 isn't merged yet, check out the branch directly:

git clone https://github.com/vercel-labs/agent-browser
cd agent-browser
git fetch origin pull/397/head:agentcore
git checkout agentcore
pnpm install && pnpm build

Then use it:

agent-browser -p agentcore open https://example.com
agent-browser close

Your AWS identity needs these IAM permissions:

bedrock-agentcore:StartBrowserSession
bedrock-agentcore:ConnectBrowserAutomationStream
bedrock-agentcore:StopBrowserSession

On a desktop machine with enough RAM, local CDP (OpenClaw's built-in browser) is free and works fine. AgentCore is the play for headless/low-compute setups.

Layer 5: Speech-to-Text — Whisper on Lambda — Covered by AWS Credits

aws-lambda-whisper-adaptor — self-hosted faster-whisper on AWS Lambda, with Deepgram-compatible and OpenAI-compatible transcription endpoints. EFS-backed model storage, pay-per-use, scales to zero.

Full setup guide: From 3-Minute Cold Starts to ~20 Seconds: Whisper on AWS Lambda + EFS.

Quick Start

Use the pre-built image — no build step needed:

docker pull ghcr.io/gabrielkoo/aws-lambda-whisper-adaptor:latest

Use this image URI when creating your Lambda function. The repo includes a SAM template for the full VPC + EFS setup.

Lambda runs in VPC for EFS access — no NAT Gateway needed (free S3 VPC Gateway Endpoint). Cold start is ~20–30s on first invocation after a model download; subsequent calls are fast.

The Cost Math

Without this setup, Claude Sonnet alone runs ~$1,000/month at standard Bedrock pricing — based on real token usage from my own sessions. OpenClaw's large context window (memory files, system prompts, tool results loaded every turn) means the token bill compounds fast.

The full stack with this setup runs at ~$20/month:

$19/mo — Amazon Q Developer Pro (flat-rate, covers all LLM calls)
≤$1/mo — Bedrock embeddings for memory search (Nova 2 at $0.00014/1K tokens)

Web search, browser automation, and speech-to-text are covered by AWS Credits — no separate line item.

With $100 in AWS Credits, you cover roughly 5 months of the full stack. Both the Q Developer Pro subscription and Bedrock embeddings are credit-eligible — if you're an AWS Community Builder, that $500/year allocation more than covers it.

Where AWS Credits Come From

AWS event participant/speaker — re:Invent, Summit, local user groups
AWS Community Builder — $500/year for active builders (builder.aws.com). The application opens a few rounds per year — I'm one of the builders in the program.
AWS Customer Council — participation typically includes credits
AWS Activate (startups) — up to $100K
AWS Educate / Academy — educators and students

Check your balance: console.aws.amazon.com/billing/home#/credits

Closing

Five layers. Two built by community members, three I built myself. All open source, all running on AWS Credits.

To be clear: kiro-gateway is the most crucial piece here. @Jwadow built the bridge that makes Claude accessible without per-token billing — I built the embedding proxy, web search proxy, and Whisper gateway to fill the remaining gaps. Web search and cloud browser (Layers 3 and 4) are purely AWS Credits — no subscription, per-token billing well covered by AWS Credits.

If you're already an AWS Community Builder or have credits sitting in your account, there's no reason to be paying per-token for a personal AI assistant. Wire it up once, and the stack runs itself.

Put those credits to work.

I Squeezed My $1k Monthly OpenClaw API Bill with ~$20/Month in AWS Credits — Here's the Exact Setup

Who This Is For

Prerequisites

What Actually Costs Money in OpenClaw?

My Config: All 5 Layers on AWS Credits

Layer 1: Main Model + Image Analysis — Kiro CLI — Covered by AWS Credits

Amazon Q Developer Pro: flat-rate access to Claude

kiro-gateway: the bridge

Layer 2: Memory Search — Bedrock Embeddings — Covered by AWS Credits

Option A: LiteLLM

Option B: bedrock-access-gateway-function-url (serverless, no fixed cost)

Layer 3: Web Search — Nova Grounding Proxy — Covered by AWS Credits

Option A: Run locally

Option B: Lambda Function URL (zero idle cost)

Layer 4: Cloud Browser — Bedrock AgentCore — Covered by AWS Credits

Layer 5: Speech-to-Text — Whisper on Lambda — Covered by AWS Credits

Quick Start

The Cost Math

Where AWS Credits Come From

Closing

Tags

Comments

More Blog

Five Gemma-4 models, one accelerator: what porting E2B 31B to AWS Inferentia2 taught me

Hey DEV, I'm Tobore. Let's actually connect.

I burned through thousands of AI tokens. Then a friend did it for free

Claude might be saturating your machine

Automated GitHub Code Reviews Using Google Gemini

What is an "agentic harness," actually?

Ready-made automations for this