Hacking with multimodal Gemma 4 in AI Studio — DeepSeek Blog | Neura Market
    Neura MarketNeura Market/DeepSeek
    ChatGPTChatGPTClaudeClaudeGeminiGeminiCursorCursorGrokGrokPerplexityPerplexityDeepSeekDeepSeek
    CoPilotCoPilotStable DiffusionStable DiffusionMidjourneyMidjourney
    View All Directories
    OverviewRulesPromptsMCPsAgentsBlogVideosGuidesCoursesCommunityTrendingGenerate
    DeepSeekBlogHacking with multimodal Gemma 4 in AI Studio
    Back to Blog
    Hacking with multimodal Gemma 4 in AI Studio
    webdev

    Hacking with multimodal Gemma 4 in AI Studio

    Paige Bailey April 4, 2026
    0 views

    We’re in an incredibly fun era for building. The friction between "I have a weird idea" and "I have a...

    We’re in an incredibly fun era for building. The friction between "I have a weird idea" and "I have a working prototype" is basically zero, especially with the release of **[Gemma 4](https://ai.google.dev/gemma/docs/core/model_card_4)**, which is now available via the Gemini API and Google AI Studio. Whether you want to deeply inspect model reasoning or you're just trying to build a pipeline to auto-caption an archive of historical web comics and obscure wiki trivia, you can now hit open-weights models directly from your code without needing to provision a massive GPU rig first. Here’s a look at the architecture, how to use it, and how to go from the UI to production code in one click. ### The Models: Apache 2.0, MoE, and 256k Context Before we look at the API, the biggest detail about [Gemma 4](https://ai.google.dev/gemma/docs/core) is the license: it's released under **Apache 2.0**. This means total developer flexibility and commercial permissiveness. You can prototype with the Gemini API, and eventually run it anywhere from a local rig to your own cloud infrastructure. The benchmarks are also genuinely impressive. The 31B model is currently sitting at #3 on the Arena AI text leaderboard, out-competing models massively larger than it. When you drop into [Google AI Studio](https://ai.dev), you'll see two primary models in the picker: * **Gemma 4 31B IT:** The flagship dense model. It has a massive 256K context window — perfect for dumping in entire codebases, massive log files, or huge JSON datasets. * **Gemma 4 26B A4B IT:** A Mixture-of-Experts (MoE) architecture. It's highly efficient, only activating roughly 4 billion parameters per inference. High throughput, lower cost. *(Note: There are also E2B and E4B "Edge" models meant for local on-device deployment that feature native audio input, but we're focusing on the AI Studio API today. I recommend that you go download and test the smaller models locally, though!)* ![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/hbdajipmhlqk4r7hugcq.png) ### Multimodal Inputs + Chain of Thought Text is great, but Gemma 4 is natively multimodal. Let's say you want to build a pipeline to reverse-engineer prompts from a folder of distinct images. In AI Studio, you can drop images directly into the playground alongside your prompt. **The Prompt:** > *"Generate descriptions of each of these images, and a prompt that I could give to an image generation model to replicate each one."* ![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dvo6oercxm0kkl9pu6mw.png) Because the Gemma models support advanced reasoning, after you click `Run`, you can click the **Thoughts** toggle to literally step through the model's chain-of-thought process *before* it generates its final output. If you love understanding the "why" behind model logic, or you're trying to debug why an agent went off the rails, this level of transparency is incredibly useful. ![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pai468mw2i2n9eofy34c.png) ### Shipping the code The bridge between "playing around in a UI" and "writing a script" should be exactly one click. Once you have your prompt, your images, and your reasoning configuration dialed in perfectly, click the **Get Code** button in the top right corner. You can grab the exact payload required for `TypeScript`, `Python`, `Go`, or standard `cURL`. Best of all, if you toggle "Include prompt/history", it automatically handles the base64 encoding of your images and explicitly sets the `thinkingConfig` parameters in the code for you. ![Image description](https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sjmpnx5b33fatifq0z1e.png) Here's what the TypeScript output looks like when you want to use Gemma 4's reasoning capabilities via the SDK: ```typescript import { GoogleGenAI } from '@google/genai'; // Initialize the client const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY }); // Configure Gemma 4 reasoning logic const config = { thinkingConfig: { thinkingLevel: 'HIGH', } }; const response = await ai.models.generateContent({ model: 'gemma-4-31b-it', contents: 'Tell me a fascinating, obscure story from internet history.', config: config }); console.log(response.text); ``` ### Go build open-source things! Having Apache 2.0 open-weights models accessible via a fast API completely changes the calculus for weekend projects. Whether you're building a script to summarize deeply technical whitepapers, analyze visual data natively, or wire up autonomous multi-step code generation agents—the friction is basically gone. I can't wait to see what you build! Let me know in the comments what rabbit hole you're pointing Gemma at first. Happy hacking this weekend. :)

    Tags

    webdevaimachinelearninggemini

    Comments

    More Blog

    View all
    How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠ai

    How I'm using ASTs and Gemini to solve the "Codebase Onboarding" problem 🧠

    Hi everyone! 👋 I’m Tara, a Senior Software Engineer and Consultant. Over the years, I've jumped...

    T
    tworrell
    Local AI Will Save Us All (The Math Says So, Trust Me)ai

    Local AI Will Save Us All (The Math Says So, Trust Me)

    Every few weeks a take goes viral in tech circles making the case for ditching cloud AI and running...

    S
    Sebastian Schürmann
    Lost in the AI Hype, I Started Smallai

    Lost in the AI Hype, I Started Small

    And it helped me get back into tech without drowning TL;DR at the end Coming back to...

    R
    Rohini Gaonkar
    Building a Replay-Tested Interactive Brokers Client in Gogo

    Building a Replay-Tested Interactive Brokers Client in Go

    I wanted an IBKR library that felt like Go and had testing I could trust. So I wrote one.

    T
    Thomas Marcelis
    Playwright in Pictures: Fully Parallel Modeplaywright

    Playwright in Pictures: Fully Parallel Mode

    Playwright’s fullyParallel mode is often treated as a simple performance switch. In practice, it...

    V
    Vitaliy Potapov
    Designing a CLI for Both Humans and Agentscli

    Designing a CLI for Both Humans and Agents

    Learn how Alpic designed its CLI for both human developers and AI agents — covering tradeoffs like polling, context windows, interactivity, and statelessness.

    J
    Julien Vallini

    Stay up to date

    Get the latest DeepSeek prompts, rules, and resources delivered to your inbox weekly.

    Neura Market LogoNeura Market

    Discover the best AI prompts, plugins, and resources for DeepSeek and more.

    Content Types

    • Rules
    • Prompts
    • MCPs
    • Agents
    • Guides

    Platforms

    • ChatGPT Directory
    • Claude Directory
    • Gemini Directory
    • Cursor Directory
    • Grok Directory
    • Perplexity Directory
    • DeepSeek Directory
    • CoPilot Directory
    • Stable Diffusion Directory
    • Midjourney Directory
    • All Directories

    Resources

    • Blog
    • Documentation
    • Help Center
    • Marketplace

    Legal

    • Privacy Policy
    • Terms of Service

    © 2026 Neura Market. All rights reserved.

    |

    Not affiliated with any AI platform vendors.