About a year ago, I turned my gaming PC into a local AI Lab. And yes, the most important word in that sentence is **LOCAL**. Let me tell you the story of how I sacrificed my gaming hours to build several tools, and now I'm going to tell you about this one that I use every single day.
## The Problem: Token bankruptcy
Day to day, all of us developers who work with Artificial Intelligence share the same headache: tokens and *rate limits*. We're all victims of the high prices that come with constantly running inference with AI agents like Claude Code, Codex, or Gemini CLI (yeah, I love working from the terminal, I LOVE CLIs).
While I was building AI systems (agent orchestration, LLM *fine-tuning*), I was burning through way too many tokens. I tried tweaking the *prompts* and cleaning up the junk in my context, but the real devourer of my quota showed up when I had to learn a new tool.
I was implementing solutions in QGIS (QGIS is a free, open-source Geographic Information System (GIS) software that allows users to create, edit, visualize, analyze, and publish geospatial data on maps) for a project and I didn't know the interface 100%. Like any dev facing something new, I leaned on AI agents: I'd take a screenshot, send it over, and ask for explanations.
**Here's an important fact that hurt my wallet:**
* A *screenshot* on my MacBook (Full HD resolution of 1920x1080) burns about 258 tokens per tile on models like Claude.
* That adds up to roughly **1,548 tokens per image** (sounds like a lot, and yeah my friend, it is way too much when we're talking about context).
* Now imagine sending dozens of these images a month trying to understand a complex interface as a 2x dev (99x, I'd say, in this new AI era).

I was eating through my hourly Claude allowance just doing visual queries, leaving me with no quota left to generate the actual code I really needed for my development.
## The Epiphany (and the Hardware)
One day, during a forced break thanks to a Claude *rate limit*, I looked over at my Gaming PC. I realized that instead of complaining about cloud costs, I could save tokens by running local models for visual extraction tasks.
My main work machine is my MacBook because it's so easy to move around with. But the Gaming PC had an extra 1 TB SSD and was running Pop!_OS, a distro where the NVIDIA drivers always stayed stable. So I decided to stop gaming and put it to work.

## The Risk: The 12GB VRAM challenge
Setting everything up in an AI *homelab* was a challenge.
1. **Private Network:** I installed Tailscale to manage the server securely from anywhere.
2. **The Local Ecosystem:** I started exploring Ollama and llama.cpp.
3. **The Bottleneck:** My GPU is an RTX 4070 with 12GB of VRAM. In the AI world, that doesn't get you very far, so I had to go into budget mode and chase extreme efficiency.

I needed a service I could send a screenshot to and get the context back. A traditional OCR extracts pure text at the code level, but that's useless when you need to understand an interface. The answer was in the **VLMs (Vision Language Models)**, which thanks to their pre-training don't just read, they *understand* the image.
## The Result: An 8-second API
I rolled up my sleeves and found the perfect model for my precious 12GB of VRAM: `qwen2.5-vl:7b`. (Yes, with just 7B parameters you can get incredible results).
I built a small API that queries Ollama. Now I just paste the screenshot, the VLM parses the image, and another agent interprets the context. This whole process hands me back an accurate answer in about **8 seconds**, depending on the image, all private with no data leaving my LOCAL network.

## The Next Level
Sacrificing a bit of *gaming* to put together my own *homelab* with pure code has been completely worth it. It's a simple solution, but it represents direct savings in money and technical resources.
This local infrastructure no longer just reads *screenshots*. In fact, I'm currently using this same ecosystem (my homelab) for a plant identification project on a farm, processing images captured from drone flights. *(If you're interested in how to orchestrate and do computer vision by training LLMs to analyze drone images, drop it in the comments and I'll put together the next post).*
---
*Building all the way from the friction of rate limits to having a local computer vision API is exactly the kind of challenge I enjoy solving.*

Here's the repository where I built the VLM API to get the parsing and context of my screenshots →
{% embed https://github.com/wizsebastian/VLM-local-parser %}.
A big hug, your dev friend Luis Sebastian Vasquez, use AI responsibly and safely.
{% cta https://www.linkedin.com/in/luissebastianvasquez/ %}
Connect with me on LinkedIn!
{% endcta %}