ai

deep dive: the ai models i use

Philip Hern April 3, 2026

0 views

a personal inventory of the models i route to in cursor: what each one is for, where it shines, where it costs me time, and links to vendor documentation

i spend most of my working day using an ai assistant in cursor. the part that is easy to skip in public write-ups are the simpler details like which model name maps to which vendor, what each one is trying to be good at, and where i should not pretend it is interchangeable with the others. this post is that roster for me, written as of friday mar 20, 2026. i am not running benchmarks here. i am writing down how these models behave in my hands, with links so you can read the official specs if you want to explore further on your own. for why i treat multi-model routing as a production-era default, see [from prototype to production: my early adopter view of ai](https://philliant.com/posts/from-prototype-to-production-ai/). ## quick answer six models in my rotation right now: **composer 2** when i want cursor-native agentic work, **gpt-5.3 codex xhigh** when i need serious implementation muscle, **claude 4.6 opus max** when the problem is genuinely hard and i want anthropic thinking, **gemini 3.1 pro** when the input is big or visual, **grok 4.20** when i am stuck and want a fresh perspective, and **kimi k2.5** when i want strong tool use from outside the usual three vendors. ## who this is for - anyone already using cursor (or something similar) who wants to know what models are out there - engineers who do not want to watch an hour of launch videos to get a vendor map - future me, six months from now, when half of these names have changed and i need to remember what i was actually using ## comparison table the table is the quick reference. the sections below are where i get honest about what each model is actually like to use. | model (as shown in my router) | maker | speciality / intended use | pro / con | documentation | | ----------------------------- | ----------- | ------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------- | | `composer-2` | cursor | agentic coding inside cursor: edits, terminal-shaped workflows, tool use | pro: built for the editor; strong on long-horizon tasks with summarization training. con: not a portable api model in my mental model; i think of it as an environment capability, not a generic llm | [composer 2 model page](https://cursor.com/docs/models/cursor-composer-2) | | `gpt-5.3-codex-xhigh` | openai | agentic coding via the codex line; the `xhigh` suffix is how my router encodes a higher reasoning effort preset on top of the codex family | pro: excellent when i want careful refactors and api-shaped thinking. con: slower and more expensive than "just answer fast" tiers; easy to overuse on trivia | [gpt-5-codex model](https://developers.openai.com/api/docs/models/gpt-5-codex), [codex product hub](https://developers.openai.com/codex) | | `claude-4.6-opus-max` | anthropic | maximum depth sonnet-family reasoning when latency is a fair price | pro: best anthropic option in my rotation for subtle bugs, spec ambiguity, and multi-file coherence. con: the cost and latency are real; i save it for work that deserves the tax | [claude models overview](https://docs.anthropic.com/en/docs/about-claude/models) | | `gemini-3.1-pro` | google | flagship gemini tier for long context and strong multimodal reasoning in the gemini stack | pro: great when i am dragging in screenshots, pdf-shaped context, or very wide file sets. con: vendor-specific quirks still matter; i verify critical logic instead of trusting vibe | [gemini models](https://ai.google.dev/gemini-api/docs/models) | | `grok-4-20` | xai | grok 4 family reasoning with the 4.20 snapshot naming xai uses in api surfaces | pro: useful second opinion when i feel anchored to one vendor's "house style". con: i treat cutting-edge models as higher variance until i have personal calibration data | [xai api introduction](https://docs.x.ai/docs/introduction/) | | `kimi-k2.5` | moonshot ai | kimi k2 line tuned for coding, math-style reasoning, and tool calling on moonshot's platform | pro: strong when i want mixture-of-experts-style efficiency stories and a different training prior than the usual us trio. con: operational details (regions, billing, rate limits) are another console to respect | [kimi api quickstart](https://platform.moonshot.ai/docs/guide/start-using-kimi-api) | ## `composer-2` (cursor) composer 2 is cursor's house model for agentic work such as file edits, tool calls, and terminal workflows. it does not feel like chatting with an llm. it feels like the editor itself got smarter. i use it when the task lives in the repo: multi-step refactors, searching across the workspace, long sessions where i do not want to re-explain context every ten minutes. i do not think of it as an api model i happen to access through cursor. it is more like a capability of the editor itself. the official docs say it is tuned for tool use and long horizons. that matches what i see. ## `gpt-5.3-codex-xhigh` (openai) this is my "i need the ai to really think about this" slot on the openai side. the public docs call the family gpt-5-codex; the `5.3` and `xhigh` parts are how my router encodes the version and reasoning effort. your account might show a different string. i use it when the work is code-heavy and i want the model to show its reasoning, not just spit out an answer. it shines when the change touches contracts, apis, types, migrations, or anything where a wrong assumption quietly spreads. the downside is obvious: it is slower and more expensive, and it tempts me into using a sledgehammer on a thumbtack. ## `claude-4.6-opus-max` (anthropic) this is my only anthropic route right now and i save it for the hard stuff: security-sensitive code, tricky concurrency, specs that contradict themselves, and problems where i want the model to slow down and really chew on it. the trade-off is cost and patience. opus is not "better" at everything. it is better at the things where i would otherwise redo the work three times trying to get it right with a faster model. i check anthropic's model pages periodically because vendors bump versions quietly and my router changes behavior without telling me. ## `gemini-3.1-pro` (google) gemini is where i go when the input is not just code. screenshots, long mixed documents, big file sets, and that is where the pro tier earns its keep for me. same review standard applies though. if the answer involves auth, money, or data integrity, the model is writing drafts, not making decisions. i sign off. always. ## `grok-4-20` (xai) grok is my "break the pattern" model. when i have been staring at the same bug through two other model families and getting nowhere, throwing it at a third set of priors sometimes finds the thing i missed faster than another hour of printf debugging. i keep my expectations honest though. this model does not compete with the above flagship models, but it sometimes i even find value in seeing what it gets wrong which prompts a better question for me to ask to one of the better models. it is kind of like using microsoft edge to download google chrome. ## `kimi-k2.5` (moonshot ai) kimi k2.5 is my pick when i want strong coding and tool calling from outside the usual us vendor trio. moonshot makes it easy to try because their endpoints are openai-compatible, so i do not have to rewire everything to test it. i only keep a model in rotation to make sure i do not **always** only use the same two or three models. otherwise it just collects dust. ## how i actually pick (it is not scientific) 1. lots of files, lots of tool calls → composer 2 2. hard code problem, i want to see the reasoning → codex xhigh or opus max, depending on whether i want openai-flavored or anthropic-flavored thinking 3. big context window or images involved → gemini 3.1 pro 4. i have been going in circles for an hour → grok or kimi for a fresh set of eyes ## faq ### do you run all six every day? no. most days it is 90% gpt-5-codex. the full roster is there for when i need it, and over time i have built up a mental map of which model tends to do well on which kind of task. ### should i copy this exact list? please do not. if you are not living inside an agentic editor all day, half of this will not make sense for your workflow. honestly, one fast model and one deep model will cover most people. add a third only if you keep running into the same wall. ## references - [cursor documentation home](https://cursor.com/docs/) - [openai developer documentation](https://developers.openai.com/api/docs/overview) - [anthropic claude documentation](https://docs.anthropic.com/en/home) - [google ai for developers (gemini)](https://ai.google.dev/gemini-api/docs) - [xai developer documentation](https://docs.x.ai/docs/introduction/) - [moonshot kimi api platform](https://platform.moonshot.ai/docs) ## related reading - [from prototype to production: my early adopter view of ai](../20260318-from-prototype-to-production-ai/index.md) - [my cursor setup](../20260313-my-cursor-setup/index.md) - [a practical ai workflow: jira, github, and mcp](../20260319-practical-ai-workflow-jira-github-mcp/index.md) - [ai series](https://philliant.com/series/ai/), the rest of the arc in one place

deep dive: the ai models i use

Tags

Comments

More Blog

Cursor vs Claude Code in 2026: Which AI Coding Tool Actually Makes You Faster?

The 5 MCPs that actually changed how I use Cursor and Claude Code

AI-Powered Development 2026: Beyond Basic Code Generation

Cursor AI vs GitHub Copilot: Developer Comparison 2025

How to Build 3D & AR Apps with AI — Cursor, Windsurf, Claude Code

AI Coding Market Share 2026: Who's Winning?