Hacking with multimodal Gemma 4 in AI Studio

We’re in an incredibly fun era for building. The friction between "I have a weird idea" and "I have a working prototype" is basically zero, especially with the release of **[Gemma 4](https://ai.google.dev/gemma/docs/core/model_card_4)**, which is now available via the Gemini API and Google AI Studio. Whether you want to deeply inspect model reasoning or you're just trying to build a pipeline to auto-caption an archive of historical web comics and obscure wiki trivia, you can now hit open-weights models directly from your code without needing to provision a massive GPU rig first. Here’s a look at the architecture, how to use it, and how to go from the UI to production code in one click. ### The Models: Apache 2.0, MoE, and 256k Context Before we look at the API, the biggest detail about [Gemma 4](https://ai.google.dev/gemma/docs/core) is the license: it's released under **Apache 2.0**. This means total developer flexibility and commercial permissiveness. You can prototype with the Gemini API, and eventually run it anywhere from a local rig to your own cloud infrastructure. The benchmarks are also genuinely impressive. The 31B model is currently sitting at #3 on the Arena AI text leaderboard, out-competing models massively larger than it. When you drop into [Google AI Studio](https://ai.dev), you'll see two primary models in the picker: * **Gemma 4 31B IT:** The flagship dense model. It has a massive 256K context window — perfect for dumping in entire codebases, massive log files, or huge JSON datasets. * **Gemma 4 26B A4B IT:** A Mixture-of-Experts (MoE) architecture. It's highly efficient, only activating roughly 4 billion parameters per inference. High throughput, lower cost. *(Note: There are also E2B and E4B "Edge" models meant for local on-device deployment that feature native audio input, but we're focusing on the AI Studio API today. I recommend that you go download and test the smaller models locally, though!)* ![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hbdajipmhlqk4r7hugcq.png) ### Multimodal Inputs + Chain of Thought Text is great, but Gemma 4 is natively multimodal. Let's say you want to build a pipeline to reverse-engineer prompts from a folder of distinct images. In AI Studio, you can drop images directly into the playground alongside your prompt. **The Prompt:** > *"Generate descriptions of each of these images, and a prompt that I could give to an image generation model to replicate each one."* ![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dvo6oercxm0kkl9pu6mw.png) Because the Gemma models support advanced reasoning, after you click `Run`, you can click the **Thoughts** toggle to literally step through the model's chain-of-thought process *before* it generates its final output. If you love understanding the "why" behind model logic, or you're trying to debug why an agent went off the rails, this level of transparency is incredibly useful. ![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pai468mw2i2n9eofy34c.png) ### Shipping the code The bridge between "playing around in a UI" and "writing a script" should be exactly one click. Once you have your prompt, your images, and your reasoning configuration dialed in perfectly, click the **Get Code** button in the top right corner. You can grab the exact payload required for `TypeScript`, `Python`, `Go`, or standard `cURL`. Best of all, if you toggle "Include prompt/history", it automatically handles the base64 encoding of your images and explicitly sets the `thinkingConfig` parameters in the code for you. ![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sjmpnx5b33fatifq0z1e.png) Here's what the TypeScript output looks like when you want to use Gemma 4's reasoning capabilities via the SDK: ```typescript import { GoogleGenAI } from '@google/genai'; // Initialize the client const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); // Configure Gemma 4 reasoning logic const config = { thinkingConfig: { thinkingLevel: 'HIGH', } }; const response = await ai.models.generateContent({ model: 'gemma-4-31b-it', contents: 'Tell me a fascinating, obscure story from internet history.', config: config }); console.log(response.text); ``` ### Go build open-source things! Having Apache 2.0 open-weights models accessible via a fast API completely changes the calculus for weekend projects. Whether you're building a script to summarize deeply technical whitepapers, analyze visual data natively, or wire up autonomous multi-step code generation agents—the friction is basically gone. I can't wait to see what you build! Let me know in the comments what rabbit hole you're pointing Gemma at first. Happy hacking this weekend. :)

Hacking with multimodal Gemma 4 in AI Studio

Tags

Comments

More Blog

How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠

Local AI Will Save Us All (The Math Says So, Trust Me)

Lost in the AI Hype, I Started Small

Building a Replay-Tested Interactive Brokers Client in Go

Playwright in Pictures: Fully Parallel Mode

Designing a CLI for Both Humans and Agents