Build a Streaming Gemini Chat in Angular with Signals — Then Ship It on Cloud Run

If you have built a chat UI for a large language model in the last two years, you probably reached for RxJS, an `OnPush` component, an `async` pipe, and a `BehaviorSubject` per piece of state. It worked, but it was a lot of plumbing for what is fundamentally a very simple shape: *one string that grows over time*. Angular Signals collapse that plumbing into a single primitive. And it turns out that streaming Gemini responses with Signals is one of the cleanest, most satisfying pieces of code you can write in modern Angular today. In this tutorial we will build a working Google AI chat component, in roughly one hundred lines, that streams tokens from Gemini in real time, supports a stop button, and feels native on desktop and mobile. Then we will ship it safely on Cloud Run with a thin proxy, so you can drop a live, embedded demo into your post. ## Why Signals are a perfect fit for streaming AI A streaming LLM response is, mechanically, a sequence of small text deltas arriving over a fetch stream. Old-school Angular handled this with `Subject`s, async pipes, and a lot of trust that change detection would do the right thing. Signals reframe the problem. A `signal<string>('')` is just a value that you call `.update()` on. Each update notifies only the views that read that signal, and Angular 20 with zoneless change detection skips the whole-tree dirty check entirely. That means you can call `.update()` thirty times a second from inside a `for await` loop and your UI will not break a sweat. There is also a smaller, ergonomic win. With Signals the rendering rule is "whatever the signal is at this instant." Streaming chat is a value that is *visibly mid-update*, and Signals give you the perfect vocabulary for that — the in-flight token buffer is just another signal, alongside the committed message history. ## What we are building A single-page Angular app with one component. You type a question, hit send, and watch Gemini's answer stream in word by word. There is a stop button that cancels the stream, a running history of messages, and that is it. We will use Angular 20 standalone components, Signals, the new control flow (`@for`, `@if`), and the official `@google/genai` SDK. You can find the finished repo on GitHub at the link at the bottom of this post. ## Prerequisites You will need Node 20 or newer, the Angular CLI (`npm i -g @angular/cli`), and a Gemini API key from [Google AI Studio](https://aistudio.google.com/app/apikey). The free tier is more than enough to follow along. A note on the API key, because this matters: in the local version we read the key from an environment variable that gets bundled into the client. **That is fine for local exploration. It is not fine for production.** Anything in your bundle is visible to anyone who opens DevTools. We will fix this in the deploy section by adding a small proxy on Cloud Run — the key stays on the server, and the Angular code barely changes. ## Project setup Spin up a new Angular project with the CLI: ```bash ng new gemini-stream --standalone --routing=false --style=css --skip-tests cd gemini-stream npm i @google/genai ``` Open `src/environments/environment.ts` (create it if the CLI did not) and add your key: ```ts export const environment = { geminiApiKey: 'YOUR_AI_STUDIO_KEY_HERE', }; ``` Add the same file under `environment.development.ts` if you use a separate dev environment, and make sure `.gitignore` keeps these out of source control if you put a real key in. In `src/app/app.config.ts`, opt into zoneless change detection. By Angular 20 this is a stable provider, and it gives you the per-signal update path that makes streaming feel snappy: ```ts import { ApplicationConfig, provideZonelessChangeDetection } from '@angular/core'; export const appConfig: ApplicationConfig = { providers: [provideZonelessChangeDetection()], }; ``` That is the entire setup. On to the interesting bits. ## The Gemini service Create `src/app/gemini.service.ts`. The job of this service is small: take a chat history, return an async iterable of text deltas, and let the caller stop early. ```ts import { Injectable } from '@angular/core'; import { GoogleGenAI } from '@google/genai'; import { environment } from '../environments/environment'; export type ChatRole = 'user' | 'model'; export interface ChatMessage { role: ChatRole; content: string; } @Injectable({ providedIn: 'root' }) export class GeminiService { private ai = new GoogleGenAI({ apiKey: environment.geminiApiKey }); async *stream( history: ChatMessage[], shouldStop: () => boolean = () => false, ): AsyncGenerator<string> { const response = await this.ai.models.generateContentStream({ model: 'gemini-2.5-flash', contents: history.map((m) => ({ role: m.role, parts: [{ text: m.content }], })), }); for await (const chunk of response) { if (shouldStop()) return; const text = chunk.text; if (text) yield text; } } } ``` Three things worth pointing out here. First, `generateContentStream` returns an async iterable of chunks. Each chunk has a `text` getter that gives you the new tokens for that step. That is all the SDK asks of you. Second, we accept a `shouldStop` predicate instead of an `AbortController`. This keeps cancellation logic on our side, where it composes nicely with Signals — the predicate is going to read a signal, and the moment the user clicks Stop, the next iteration of the loop bails out. Third, the service yields strings, not chunks. By the time anything else in the app sees a delta, it is already plain text. That keeps our chat component free of any SDK-specific types. ## Signals-based chat state Now the chat component. Create `src/app/chat.component.ts` and start with the state. The whole point of this article is in this section, so read it slowly. ```ts import { ChangeDetectionStrategy, Component, computed, effect, inject, signal, viewChild, ElementRef, } from '@angular/core'; import { GeminiService, ChatMessage } from './gemini.service'; @Component({ selector: 'app-chat', standalone: true, changeDetection: ChangeDetectionStrategy.OnPush, template: ``, styles: [`/* coming up next */`], }) export class ChatComponent { private gemini = inject(GeminiService); readonly messages = signal<ChatMessage[]>([]); readonly draft = signal(''); readonly streaming = signal(''); readonly isStreaming = signal(false); readonly stopRequested = signal(false); readonly canSend = computed( () => this.draft().trim().length > 0 && !this.isStreaming(), ); private scroller = viewChild<ElementRef<HTMLDivElement>>('scroller'); constructor() { effect(() => { // Read the streaming buffer and message count to re-trigger on every update, // then scroll to the bottom on the next animation frame. this.streaming(); this.messages().length; const el = this.scroller()?.nativeElement; if (el) requestAnimationFrame(() => (el.scrollTop = el.scrollHeight)); }); } async send() { if (!this.canSend()) return; const userMessage: ChatMessage = { role: 'user', content: this.draft().trim() }; this.messages.update((m) => [...m, userMessage]); this.draft.set(''); this.streaming.set(''); this.isStreaming.set(true); this.stopRequested.set(false); try { for await (const delta of this.gemini.stream( this.messages(), () => this.stopRequested(), )) { this.streaming.update((s) => s + delta); } } catch (err) { this.streaming.update((s) => s + `\n\n_Error: ${(err as Error).message}_`); } finally { const final = this.streaming(); if (final) { this.messages.update((m) => [...m, { role: 'model', content: final }]); } this.streaming.set(''); this.isStreaming.set(false); } } stop() { this.stopRequested.set(true); } } ``` Five signals carry the entire state of the chat. `messages` is the committed history. `draft` is what is in the textarea. `streaming` is the buffer for the in-flight assistant reply, separate from the history so we can render it differently. `isStreaming` and `stopRequested` are the control flags. Notice that `canSend` is a `computed`. We never write to it, we never subscribe to it; we just read it from the template and Angular figures out when it changes. That single line replaces the form-validation observable boilerplate you might be used to. The `effect` is doing the auto-scroll. By reading `streaming()` and `messages().length` inside the effect, we tell Angular: "rerun me whenever either of these changes." Then we scroll the chat container to the bottom on the next frame. This is the kind of small DOM concern that used to require `AfterViewChecked` and a flag; here it is six lines. The `send` method is where streaming meets state. We push the user message, clear the buffer, then iterate over the service's async generator and call `.update()` on the streaming signal for each delta. When the loop ends (or the user hits Stop, which makes `shouldStop` return true on the next iteration), we commit whatever was in the buffer to the message history and reset. ## The template Replace the placeholder template and styles in the same file: ```ts template: ` <div class="shell"> <div class="scroller" #scroller> @for (m of messages(); track $index) { <div class="msg {{ m.role }}">{{ m.content }}</div> } @if (isStreaming() && streaming()) { <div class="msg model streaming">{{ streaming() }}<span class="cursor"></span></div> } </div> <form class="composer" (submit)="$event.preventDefault(); send()"> <textarea rows="2" placeholder="Ask Gemini something..." [value]="draft()" (input)="draft.set($any($event.target).value)" (keydown.enter)="$event.preventDefault(); send()" ></textarea> @if (isStreaming()) { <button type="button" (click)="stop()">Stop</button> } @else { <button type="submit" [disabled]="!canSend()">Send</button> } </form> </div> `, styles: [` .shell { display: flex; flex-direction: column; height: 100dvh; max-width: 720px; margin: 0 auto; font-family: system-ui, sans-serif; } .scroller { flex: 1; overflow-y: auto; padding: 1rem; display: flex; flex-direction: column; gap: 0.75rem; } .msg { padding: 0.75rem 1rem; border-radius: 12px; white-space: pre-wrap; line-height: 1.5; max-width: 85%; } .msg.user { align-self: flex-end; background: #4285f4; color: white; } .msg.model { align-self: flex-start; background: #f1f3f4; color: #202124; } .cursor { display: inline-block; width: 0.5ch; background: currentColor; margin-left: 2px; animation: blink 1s steps(1) infinite; } @keyframes blink { 50% { opacity: 0; } } .composer { display: flex; gap: 0.5rem; padding: 1rem; border-top: 1px solid #eee; } textarea { flex: 1; resize: none; padding: 0.75rem; border-radius: 12px; border: 1px solid #ddd; font: inherit; } button { padding: 0 1.25rem; border-radius: 12px; border: none; background: #4285f4; color: white; font-weight: 600; cursor: pointer; } button:disabled { opacity: 0.5; cursor: not-allowed; } `] ``` The new control flow (`@for`, `@if`, `@else`) makes this template read like a small story: render every committed message, then render the in-flight reply if there is one, then show Send or Stop based on whether we are mid-stream. The blinking cursor on the streaming bubble is a tiny detail that makes the whole thing feel alive. Wire the component into `src/app/app.component.ts` as the only thing rendered, run `ng serve`, and you should have a working streaming chat at `http://localhost:4200`. ## Shipping it on Cloud Run The local app calls Gemini directly with a key in the bundle. To ship it safely we need two small moves: a tiny server proxy that holds the key, and Cloud Run to host both the proxy and the static Angular build. Create `server/index.ts` at the project root: ```ts import express from 'express'; import { GoogleGenAI } from '@google/genai'; const app = express(); const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! }); app.use(express.json({ limit: '4mb' })); app.use(express.static('dist/gemini-stream/browser')); app.post('/api/stream', async (req, res) => { res.setHeader('Content-Type', 'text/plain; charset=utf-8'); res.setHeader('Transfer-Encoding', 'chunked'); const stream = await ai.models.generateContentStream({ model: 'gemini-2.5-flash', contents: req.body.contents, }); for await (const chunk of stream) { if (chunk.text) res.write(chunk.text); } res.end(); }); app.listen(process.env.PORT || 8080); ``` Update `gemini.service.ts` to read from the proxy with `fetch` instead of calling the SDK in the browser. The SDK and the API key never leave the server: ```ts async *stream(history: ChatMessage[], shouldStop = () => false) { const res = await fetch('/api/stream', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ contents: history.map((m) => ({ role: m.role, parts: [{ text: m.content }] })), }), }); const reader = res.body!.pipeThrough(new TextDecoderStream()).getReader(); while (true) { if (shouldStop()) { reader.cancel(); return; } const { value, done } = await reader.read(); if (done) return; if (value) yield value; } } ``` This is the part I love about the Signals architecture: the component code does not change at all. The signals do not care that the bytes are coming from a Cloud Run service now instead of the SDK. Same loop, same `streaming.update()` call. Add a `Dockerfile` at the project root: ```dockerfile FROM node:20-alpine AS build WORKDIR /app COPY package*.json ./ RUN npm ci COPY . . RUN npm run build && npx tsc -p server FROM node:20-alpine WORKDIR /app COPY --from=build /app/dist ./dist COPY --from=build /app/server/dist ./server COPY --from=build /app/node_modules ./node_modules COPY --from=build /app/package*.json ./ ENV NODE_ENV=production CMD ["node", "server/index.js"] ``` Then ship it with one command — Cloud Run will build the container from source for you: ```bash gcloud run deploy gemini-stream \ --source . \ --region us-central1 \ --allow-unauthenticated \ --set-env-vars GEMINI_API_KEY=YOUR_AI_STUDIO_KEY ``` You will get back a URL like `https://gemini-stream-xxxxxx.us-central1.run.app`. Test it in the browser, confirm the chat works end to end, and you are done. The fun part: dev.to has a first-class Cloud Run embed, so here you go: {% embed https://gemini-stream-1070943699730.us-central1.run.app %} ## What you actually built The whole thing — service, component, template, styles — comes in just over a hundred lines. Compare that to an equivalent app two years ago and you will notice what is *missing*: there is no `Subject`, no `BehaviorSubject`, no `async` pipe, no `OnPush` boilerplate that you have to think about, no manual subscription cleanup. Signals plus the new control flow plus zoneless change detection is genuinely a different programming model, and streaming AI is the application that shows it off best. A couple of small things to try next, in roughly increasing order of effort: Add a `systemInstruction` to the `generateContentStream` call to give your model a persona. The SDK accepts it as a sibling of `contents` on the proxy side. Switch from text-only input to multimodal: drop an image into the chat and forward it from the proxy as a `parts` entry of `{ inlineData: { mimeType, data } }`. Gemini handles the rest. Prefer Firebase to Cloud Run? Firebase AI Logic gives you the same proxy pattern with less infra — install `firebase` and `@firebase/ai`, and the SDK shape stays almost identical. You give up the dev.to Cloud Run embed, but the Angular code is unchanged. Try the same UI against [Chrome's Built-in AI](https://developer.chrome.com/docs/ai/built-in) (Gemini Nano running on-device, no key, no network). The Prompt API has its own streaming primitive that drops into the same Signal-based shell with almost no changes — and you get an offline-capable chat for free. ## Wrap-up If you take one thing away from this post, let it be that *Signals were designed for values that change a lot*, and an LLM stream is the canonical example of a value that changes a lot. The pieces fit so cleanly that the resulting code reads more like a description of the UI than like a program. Repo: [https://github.com/TomWebwalker/gemini-stream-angular](https://github.com/) If you build something with this drop a link in the comments — I would love to see what people make of it.

Build a Streaming Gemini Chat in Angular with Signals — Then Ship It on Cloud Run

Tags

Comments

More Blog

Minimalist EKS: The Easy Way

Never forget to enter the Stern Grove lottery again!

A Free Screenshot Editor That Never Uploads Your Image

I built a CLI to break my highlights out of Apple Books

A Developer's Guide to Agent Hooks in Antigravity CLI

Tactical vs. Strategic Agentic AI Development — A Playbook for Developers