How to Use the Gemini Deep Research API in Production

title: How to Use the Gemini Deep Research API in Production published: true date: 2026-03-04 16:08:05 UTC tags: googlecloudrun,deepresearch,pubsub,asynchronousprogramming canonical_url: https://medium.com/google-cloud/how-to-use-the-gemini-deep-research-api-in-production-978055873a39

Cover image

How many of us have gone down the research rabbit hole? Way too many tabs, links, and notes in the pursuit of knowledge? It’s all useful stuff, but time-consuming and distracting.

Since I discovered the Gemini Deep Research Agent, I haven’t turned back. And best of all, it has a powerful and straightforward API to kick off research programmatically. Let’s explore how to use it, and the patterns for including this in a production architecture.

Async changes everything

A single research task can trigger dozens of search queries and take several minutes to complete. The asynchronous Interactions API provides a polling-based interface with a required background=True parameter to check on progress.

If you’ve ever worked with a Pub/Sub pipeline or job queue, this will feel familiar.

Meet the Interactions API

The Interactions API is a newer, unified interface for working with Gemini models and agents. It replaces the older generateContent pattern for scenarios that need state management, tool orchestration, or background execution.

You create an interaction, point it at the deep research agent, and tell it to run in the background:

from google import genai

client = genai.Client(api_key=GEMINI_API_KEY)

# Launch the research agent in the background
interaction = client.interactions.create(
    input="Research the history and future of Solid State Batteries.",
    agent='deep-research-pro-preview-12-2025',
    background=True
)

That call returns immediately with an interaction ID. The agent is now off doing its thing, autonomously planning search queries, reading pages, and iterating on its analysis. Your application is free to do whatever it needs to do in the meantime.

Polling for results

Now you need a way to check whether the agent has finished. The status field tells you everything you need to know:

while True:
    interaction = client.interactions.get(interaction.id)

    if interaction.status == "completed":
        # The full research report is ready
        print(interaction.outputs[-1].text)
        break
    elif interaction.status == "failed":
        print(f"Research failed: {interaction.error}")
        break

    # Still working. Check again in 10 seconds.
    time.sleep(10)

Taking it to production with Cloud Run

In a notebook, a while True loop gets the job done. In production, you want something that scales, recovers from failures, and doesn’t burn compute waiting. Google Cloud offers three Cloud Run compute models that each map to a different integration pattern with the Deep Research agent.

Cloud Run service: webhook-triggered research

A Cloud Run service works when you want to trigger research from an HTTP request. The service accepts the request, kicks off the agent, stores the interaction ID, and returns immediately. A separate mechanism (a Cloud Scheduler cron, a Cloud Workflow, or a callback) handles checking the results later.

from fastapi import FastAPI
from pydantic import BaseModel
from google import genai

app = FastAPI()
client = genai.Client()

class ResearchRequest(BaseModel):
    topic: str

@app.post("/research")
async def start_research(req: ResearchRequest):
    interaction = client.interactions.create(
        input=req.topic,
        agent="deep-research-pro-preview-12-2025",
        background=True,
    )

    # Store the ID for later retrieval (e.g., in Firestore or Cloud SQL)
    save_interaction_id(interaction.id, req.topic)

    return {"interaction_id": interaction.id, "status": "started"}

Cloud Run job: batch research tasks

A Cloud Run job is a natural fit for one-shot or scheduled research. Jobs execute code and stop, which maps cleanly to “launch, poll, write, exit.” If you have a batch of research topics, you can fan them out as parallel job tasks.

from google import genai
from google.cloud import storage

client = genai.Client()

def run_research_job():
    topic = os.environ.get("RESEARCH_TOPIC", "Default research topic")

    interaction = client.interactions.create(
        input=topic,
        agent="deep-research-pro-preview-12-2025",
        background=True,
    )

    # Poll until done
    while True:
        result = client.interactions.get(interaction.id)
        if result.status == "completed":
            # Write the report to Cloud Storage and exit
            bucket = storage.Client().bucket("my-research-reports")
            bucket.blob(f"{interaction.id}.md").upload_from_string(
                result.outputs[-1].text
            )
            return
        elif result.status == "failed":
            raise RuntimeError(f"Research failed: {result.error}")
        time.sleep(10)

run_research_job()

Cloud Run worker pool: continuous research dispatcher

The most interesting option for a production pipeline is a Cloud Run worker pool. Worker pools are designed for continuous, non-HTTP, pull-based background processing. They don’t need a public endpoint, they don’t autoscale by default (you bring your own logic), and they cost up to 40% less than instance-billed services.

If you’re building a system that continuously pulls research requests from a Pub/Sub subscription, dispatches them to the agent, and writes completed reports to storage, a worker pool is purpose-built for that pattern.

from google import genai
from google.cloud import pubsub_v1, storage

client = genai.Client()
subscriber = pubsub_v1.SubscriberClient()
subscription_path = "projects/my-project/subscriptions/research-requests"

def handle_message(message):
    topic = message.data.decode("utf-8")

    interaction = client.interactions.create(
        input=topic,
        agent="deep-research-pro-preview-12-2025",
        background=True,
    )

    # Poll until done, then write results
    while True:
        result = client.interactions.get(interaction.id)
        if result.status == "completed":
            bucket = storage.Client().bucket("my-research-reports")
            bucket.blob(f"{interaction.id}.md").upload_from_string(
                result.outputs[-1].text
            )
            message.ack()
            return
        elif result.status == "failed":
            message.nack() # Retry later
            return
        time.sleep(10)

# Pull messages continuously (worker pool stays alive)
streaming_pull = subscriber.subscribe(subscription_path, callback=handle_message)
streaming_pull.result()

Grounding with your own data

Web research is powerful, but sometimes you need the agent to work with private data or internal documents. The Deep Research agent supports a file search tool for exactly this. Think of it as RAG, but orchestrated automatically by the agent rather than wired up manually.

interaction = client.interactions.create(
    input="Compare our 2025 fiscal year report against current public web news.",
    agent='deep-research-pro-preview-12-2025',
    background=True,
    tools=[{
        "type": "file_search",
        "file_search_store_names": [FILE_SEARCH_STORE_NAME]
    }]
)

This is where the architecture gets interesting for enterprise use cases. The agent can combine internet research with grounded analysis of your internal documents, all within a single research task.

Stateful follow-ups

After a research task completes, you can ask follow-up questions that reference the original research context without re-running the entire workflow:

follow_up = client.interactions.create(
    input="Can you elaborate on the key findings?",
    model="gemini-3.1-pro-preview",
    previous_interaction_id=interaction.id
)

print(follow_up.outputs[-1].text)

Getting started

This Deep Research notebook walks you through the entire flow, from setting up the client to launching research tasks. For pricing details, check the Gemini API pricing page.

Ready to stop Googling and start delegating? Grab the notebook and run your first deep research task. I’d love to hear what you build with it. Come find me on LinkedIn, X, or Bluesky and share what research tasks you’re automating.

How to Use the Gemini Deep Research API in Production

title: How to Use the Gemini Deep Research API in Production published: true date: 2026-03-04 16:08:05 UTC tags: googlecloudrun,deepresearch,pubsub,asynchronousprogramming canonical_url: https://medium.com/google-cloud/how-to-use-the-gemini-deep-research-api-in-production-978055873a39

Async changes everything

Meet the Interactions API

Polling for results

Taking it to production with Cloud Run

Cloud Run service: webhook-triggered research

Cloud Run job: batch research tasks

Cloud Run worker pool: continuous research dispatcher

Grounding with your own data

Stateful follow-ups

Getting started

Tags

Comments

More Blog

Five Gemma-4 models, one accelerator: what porting E2B 31B to AWS Inferentia2 taught me

Hey DEV, I'm Tobore. Let's actually connect.

I burned through thousands of AI tokens. Then a friend did it for free

Claude might be saturating your machine

Automated GitHub Code Reviews Using Google Gemini

What is an "agentic harness," actually?

Ready-made automations for this