Serverless Bedrock: How I invoke Claude from Lambda in…

title: Serverless Bedrock: How I invoke Claude from Lambda in warrantyAI tags: aws, serverless, ai, bedrock cover_image: published: true

Every week I ship a new piece of warrantyAI — an AI-powered warranty management system I'm building on AWS. This week was Week 8: a 3-agent LangGraph pipeline wired to Bedrock.

Before the agents could do anything, I needed one thing to work cleanly: invoking Claude from a Lambda function without a server, without a container fleet, without an inference endpoint sitting idle burning money. {% embed https://www.linkedin.com/posts/harish-aravindan_aiplatformengineering-langgraph-awsbedrock-activity-7433883183760408576-EuL5?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAZdZV0B6jNPTfwYZj3O5Lh0p6lcypaLVAo %}

Here's exactly how I did it.

Why serverless + Bedrock is the right combo

Bedrock's invoke_model API is synchronous and stateless. It takes a request, returns a response. That's exactly what Lambda is built for. No warm model, no GPU instance, no ECS cluster. You pay per invocation, per token.

For warrantyAI's workload — sporadic document uploads, not a real-time chat product — this matters. My entire system runs under $1.30/day.

The setup: IAM first, always

Before any code, the Lambda execution role needs this policy:

{
  "Effect": "Allow",
  "Action": [
    "bedrock:InvokeModel",
    "bedrock:InvokeModelWithResponseStream"
  ],
  "Resource": [
    "arn:aws:bedrock:ap-south-1::foundation-model/anthropic.claude-haiku-4-5-20251001",
    "arn:aws:bedrock:ap-south-1::foundation-model/anthropic.claude-sonnet-4-6"
  ]
}

Scope it to specific model ARNs. Not *. Ever.

The invoke wrapper

This is the core function I reuse across all 3 agents in warrantyAI:

import json
import boto3

bedrock = boto3.client("bedrock-runtime", region_name="ap-south-1")

HAIKU  = "anthropic.claude-haiku-4-5-20251001"
SONNET = "anthropic.claude-sonnet-4-6"

def invoke_bedrock(prompt: str, model_id: str = HAIKU, max_tokens: int = 512) -> str:
    """
    Invoke a Bedrock Claude model from Lambda.
    Returns the text response as a string.
    """
    response = bedrock.invoke_model(
        modelId=model_id,
        contentType="application/json",
        accept="application/json",
        body=json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": max_tokens,
            "messages": [
                {"role": "user", "content": prompt}
            ]
        })
    )
    body = json.loads(response["body"].read())
    return body["content"][0]["text"].strip()

That's it. Stateless, reusable, testable in isolation.

Haiku-first, Sonnet fallback

Haiku is fast and cheap. Sonnet is accurate and expensive. In warrantyAI's Classifier agent, I try Haiku first. If it returns low confidence, I retry with Sonnet automatically:

def classify_warranty(structured_data: dict) -> dict:
    prompt = build_classify_prompt(structured_data)
    
    # Attempt 1: Haiku
    result = invoke_bedrock(prompt, model_id=HAIKU)
    parsed = json.loads(result)
    
    # Fallback: Sonnet if confidence < 0.7
    if parsed.get("confidence", 0) < 0.7:
        result = invoke_bedrock(prompt, model_id=SONNET)
        parsed = json.loads(result)
        parsed["model_used"] = "sonnet"
    else:
        parsed["model_used"] = "haiku"
    
    return parsed

In practice, Haiku handles ~85% of documents. Sonnet kicks in for complex commercial warranties with ambiguous clause structures.

Three things that will burn you

1. The body is a StreamingBody, not a string. Always call .read() before json.loads(). Forget this once and you'll spend 20 minutes confused.

# Wrong
body = json.loads(response["body"])

# Right
body = json.loads(response["body"].read())

2. Token limits on Lambda payloads. Lambda has a 6MB synchronous response limit. Bedrock responses are usually tiny, but if you're passing large documents in your prompt, chunk them first. I cap prompts at 4,000 characters in the Reader agent.

3. Bedrock is regional. Not all models are available in all regions. ap-south-1 (Mumbai) supports Haiku and Sonnet. If you get a ResourceNotFoundException, check model availability in your region first before debugging your code.

Cost reality check

For warrantyAI's workload (roughly 50 documents/day):

Model	Avg tokens/call	Cost/call	Daily cost
Haiku	~800	~$0.0004	~$0.017
Sonnet (15% of calls)	~800	~$0.006	~$0.005

Total Bedrock cost: under $0.025/day for this workload. The rest of my $1.30/day budget goes to Textract, SNS, and S3.

What's next

This pattern is the foundation for the entire warrantyAI pipeline. Next Sunday I'll cover how I wired these invocations into a LangGraph StateGraph — three agents, one shared state dict, no message queues.

Follow along if you're building serverless AI on AWS. I publish every Sunday in LinkedIn

This is part of the Serverless Meets AI series — practical AWS patterns from building warrantyAI.

Serverless Bedrock: How I invoke Claude from Lambda in warrantyAI

title: Serverless Bedrock: How I invoke Claude from Lambda in warrantyAI tags: aws, serverless, ai, bedrock cover_image: published: true

Why serverless + Bedrock is the right combo

The setup: IAM first, always

The invoke wrapper

Haiku-first, Sonnet fallback

Three things that will burn you

Cost reality check

What's next

Tags

Comments

More Blog

Five Gemma-4 models, one accelerator: what porting E2B 31B to AWS Inferentia2 taught me

Hey DEV, I'm Tobore. Let's actually connect.

I burned through thousands of AI tokens. Then a friend did it for free

Claude might be saturating your machine

Automated GitHub Code Reviews Using Google Gemini

What is an "agentic harness," actually?

Ready-made automations for this