I Ran AI Models Directly in the Browser and Measured What It Did to Core Web Vitals — CoPilot Blog
    Neura MarketNeura Market/CoPilot
    ChatGPTChatGPTClaudeClaudeGeminiGeminiCursorCursorGrokGrokPerplexityPerplexityCoPilotCoPilot
    DeepSeekDeepSeekStable DiffusionStable DiffusionMidjourneyMidjourney
    View All Directories
    OverviewRulesPromptsMCPsAgentsBlogVideosGuidesCoursesCommunityPluginsTrendingGenerate
    CoPilotBlogI Ran AI Models Directly in the Browser and Measured What It Did to Core Web Vitals
    Back to Blog
    I Ran AI Models Directly in the Browser and Measured What It Did to Core Web Vitals
    webdev

    I Ran AI Models Directly in the Browser and Measured What It Did to Core Web Vitals

    Srikar Phani Kumar Marti May 17, 2026
    0 views

    Everyone is shipping AI features. Sentiment analysis on user input, speech recognition without...

    Everyone is shipping AI features. Sentiment analysis on user input, speech recognition without sending audio to a server, image classification that never leaves the device. The privacy pitch is real, the latency pitch is real. But nobody's asking the obvious question: **What does running a neural network in the browser actually cost the user?** I decided to find out. I built a benchmark harness, ran four quantized models in Chrome stable, and measured the impact on Core Web Vitals — specifically INP, the metric Google now uses to rank your site. Here's what I found. --- ## The Setup The test uses [Transformers.js](https://huggingface.co/docs/transformers.js) — the library that lets you run Hugging Face models directly in the browser via WebAssembly. All models were loaded in INT8 quantized format (q8) to reflect real production conditions. Four models, chosen to cover different architectures and modalities: | Model | Params | Task | Architecture | |---|---|---|---| | DistilBERT | 66M | Sentiment analysis | Encoder (6 layers) | | BERT-base | 110M | Feature extraction | Encoder (12 layers) | | Whisper Tiny | 39M | Speech recognition | Encoder-Decoder | | MobileViT-S | 5.7M | Image classification | Vision Transformer | The benchmark harness is live at **[benchmark.mspk.me](https://benchmark.mspk.me)** and open source at **[github.com/srikarphanikumar/cwv-ai-benchmark](https://github.com/srikarphanikumar/cwv-ai-benchmark)**. Run it yourself. --- ## What Is INP and Why Does It Matter? INP (Interaction to Next Paint) replaced First Input Delay as Google's interactivity metric in March 2024. It measures how long it takes for the browser to respond to a user interaction — a click, a tap, a keypress — and paint the result. Google's thresholds: - ✅ **Good**: under 200ms - ⚠️ **Needs Improvement**: 200–500ms - ❌ **Poor**: over 500ms INP affects your search ranking. More importantly, it affects whether users feel your app is responsive or broken. When you run neural network inference on the browser's main thread, you're blocking it. That means if a user clicks something while inference is running, their click won't be processed until the model finishes. That delay IS your INP. --- ## The Results Here's the full table from Chrome stable on an Apple M-series MacBook Pro, 16GB RAM: | Model | Load Time | Avg Inference | INP | INP Class | Mem Δ | Mem Pressure | |---|---|---|---|---|---|---| | DistilBERT | 7.85s | 25.1ms ±0.5 | **27.8ms** | ✅ Good | +59.6MB | 2.5% | | BERT-base | 6.07s | 83.3ms ±1.5 | **85.0ms** | ⚠️ Needs Improvement | +65.3MB | 4.1% | | Whisper Tiny | 6.71s | 496.9ms ±6.2 | **540.3ms** | ❌ Poor | +123.9MB | 7.1% | | MobileViT-S | 1.15s | 66.7ms ±1.0 | **75.6ms** | ⚠️ Needs Improvement | +37.0MB | 8.0% | --- ## The Surprising Findings ### 1. Parameter count doesn't predict INP Whisper Tiny has only 39M parameters — the fewest of any model tested. It also produces the worst INP at 540.3ms, more than 19x worse than DistilBERT which has 66M parameters. The culprit is architecture, not size. Whisper is an encoder-decoder model. It doesn't process the full input in a single forward pass — it runs an **autoregressive decode loop**, generating output tokens one at a time. Each iteration blocks the main thread. The total blocking time accumulates regardless of how aggressively you quantize the weights. This means **no amount of quantization will fix Whisper's INP on the main thread**. It's an architectural constraint, not a tuning problem. ### 2. MobileViT-S loads 6x faster but still misses "Good" MobileViT-S loads in 1.15s compared to 6–8 seconds for the text models. That's a huge UX win for initial load. But its INP of 75.6ms puts it in "Needs Improvement" territory despite having only 5.7M parameters. Vision transformer inference carries disproportionate cost relative to parameter count in WASM environments. Something to watch if you're building image classification features. ### 3. Memory pressure ≠ memory delta MobileViT-S has the lowest absolute memory consumption (+37MB) but the **highest memory pressure at 8.0%**. That 37MB represents a larger fraction of the available JS heap than you'd expect — with implications for mid-range Android devices where heap limits are much tighter. --- ## What This Means for Your Architecture **If you're building with encoder-only text models (DistilBERT class):** You're fine on the main thread. 27.8ms INP is negligible. Trigger inference directly on user interactions without worrying about CWV degradation. **If you're using larger encoder models (BERT-base class):** Don't trigger inference synchronously on interactions. At 85ms, stacking this with other main thread work risks crossing 200ms. Move it to a post-interaction background step — run inference after you've already painted the response. **If you're using any encoder-decoder model (Whisper, T5, BART, etc.):** You **must** offload to a Web Worker. This isn't an optimization — it's a requirement. The main thread will be blocked for hundreds of milliseconds no matter what you do. Transformers.js supports Web Worker execution natively: ```javascript import { pipeline } from '@xenova/transformers'; // Run in a Web Worker to avoid blocking main thread const transcriber = await pipeline('automatic-speech-recognition', 'Xenova/whisper-tiny', { worker: true }); ``` **If you're using vision transformers:** Test on actual mobile hardware before shipping. The memory pressure numbers on an M-series Mac will look very different on a mid-range Android. --- ## Limitations to Know **TBT couldn't be captured in the deployed environment.** The Long Tasks API isn't available in cross-origin deployed contexts — only in locally-served or Chrome DevTools Protocol environments. The INP measurements are real, but the full main thread blocking profile requires a different setup to measure properly. **All numbers are from high-end hardware.** An Apple M-series Mac is not the median global web user's device. INP values on mid-range Android will be significantly higher — potentially 3–5x. The relative ordering of models should hold, but don't use these absolute numbers as production thresholds for mobile. --- ## Try It Yourself The benchmark is live and open source. Run it on your device, your network conditions, your hardware profile. Export the results as JSON or CSV. - **Live benchmark**: [benchmark.mspk.me](https://benchmark.mspk.me) - **Source code**: [github.com/srikarphanikumar/cwv-ai-benchmark](https://github.com/srikarphanikumar/cwv-ai-benchmark) - **Full paper**: arXiv link coming soon If you run it on a mid-range Android or a low-end device and want to share the numbers, I'd love to see them — that's exactly the follow-on data this research needs. --- ## TL;DR - DistilBERT is the only model that stays in Google's "Good" INP range on the main thread - Whisper Tiny is "Poor" despite being the smallest model — architecture beats quantization - Encoder-decoder models require Web Worker offloading — no exceptions - Parameter count is a bad proxy for browser inference cost - Memory pressure on mobile is a separate concern from memory consumption The era of client-side AI is here. Now we need to measure what it actually costs.

    Tags

    webdevaiwebvitalscorewebvitals

    Comments

    More Blog

    View all
    Minimalist EKS: The Easy Waykubernetes

    Minimalist EKS: The Easy Way

    Amazon EKS manages the Kubernetes control plane, but you remain responsible for provisioning the...

    J
    Joaquin Menchaca
    Never forget to enter the Stern Grove lottery again!ai

    Never forget to enter the Stern Grove lottery again!

    Browser automation with Playwright, Python, GitHub Actions, and Entire to auto-enter San Francisco Stern Grove concert lotteries each week!

    L
    Lizzie Siegle
    A Free Screenshot Editor That Never Uploads Your Imagetypescript

    A Free Screenshot Editor That Never Uploads Your Image

    A free screenshot and image editor that runs entirely in your browser. Keeping every edit reversible and handling big phone photos, in plain TypeScript and Canvas2D.

    M
    Martin Stark
    I built a CLI to break my highlights out of Apple Booksshowdev

    I built a CLI to break my highlights out of Apple Books

    A macOS CLI + MCP server that exports Apple Books highlights to Markdown and gives AI assistants direct access to your reading notes.

    A
    Andrey Korchak
    A Developer's Guide to Agent Hooks in Antigravity CLIai

    A Developer's Guide to Agent Hooks in Antigravity CLI

    Motivation To be quite honest, "Hooks"—the shell commands we trigger at specific points...

    T
    Tanaike
    Tactical vs. Strategic Agentic AI Development — A Playbook for Developersagents

    Tactical vs. Strategic Agentic AI Development — A Playbook for Developers

    The Strategic Engineer: Why Writing Code Is No Longer Your Most Valuable Skill ...

    A
    Adewumi Saheed Adewale

    Stay up to date

    Get the latest CoPilot prompts, rules, and resources delivered to your inbox weekly.

    Neura Market LogoNeura Market

    Discover the best AI prompts, plugins, and resources for CoPilot and more.

    Content Types

    • Rules
    • Prompts
    • MCPs
    • Agents
    • Guides

    Platforms

    • ChatGPT Directory
    • Claude Directory
    • Gemini Directory
    • Cursor Directory
    • Grok Directory
    • Perplexity Directory
    • DeepSeek Directory
    • CoPilot Directory
    • Stable Diffusion Directory
    • Midjourney Directory
    • All Directories

    Resources

    • Blog
    • Documentation
    • Help Center
    • Marketplace

    Legal

    • Privacy Policy
    • Terms of Service

    © 2026 Neura Market. All rights reserved.

    |

    Not affiliated with any AI platform vendors.