Compare AI model performance across 48+ benchmarks. Our composite index aggregates coding, math, reasoning, and language scores into a single intelligence ranking.
At-a-glance rankings across the three dimensions that matter most.
Neura Intelligence Index; Higher is better
Max input tokens; Higher is better
USD per 1M output tokens; Lower is better
Ranked by the Neura Intelligence Index — a weighted composite of 48 benchmarks across 8 categories.
Find the sweet spot — models in the top-left quadrant offer the best value.
Top performers in each benchmark category.
Who leads on each individual benchmark — click any card to see full results.
Different tasks need different strengths. These indices re-weight our benchmarks for specific workflows.
Best for software development, code generation, and debugging
Best for scientific research, data analysis, and complex reasoning
Best for writing, editing, summarization, and creative tasks
Highest intelligence per dollar — the cost-efficiency sweet spot
The Neura Intelligence Index is a composite score (0-100) computed from 48+ individual benchmarks spanning 8 categories: coding, math, reasoning, general knowledge, language, multimodal, safety, and agentic tasks.
Not all models have scores on all benchmarks. The confidence indicator reflects benchmark coverage: high (>70% of benchmarks), medium (40-70%), or low (<40%). Weights are renormalized across available categories so models aren't penalized for missing data.
Scores are aggregated from official model cards, Papers With Code, HuggingFace Open LLM Leaderboard, LiveBench, and LMSYS Chatbot Arena. Each score includes a verification status (official, self-reported, or aggregated).