SciEvalKit

Name: SciEvalKit
Author: InternScience

InternScience December 3, 2025

79 copies 0 downloads

A unified evaluation toolkit and leaderboard for rigorously assessing the scientific intelligence of large language and vision–language models across the full research workflow.

<h1 align="center"><img src="assets/icon/opencompass.png" alt="OpenCompass" height="50" style="vertical-align:middle;" /> SciEval ToolKit</h1> <p align="center"><strong> A unified evaluation toolkit and leaderboard for rigorously assessing the scientific intelligence of large language and vision–language models across the full research workflow. </strong></p> <hr style="width:100%;margin:16px 0;border:0;border-top:0.1px solid #d0d7de;" /> <div align="center">

<img src="assets/icon/welcome.png" alt="welcome" height="24" style="vertical-align:middle;" />  Welcome to the official repository of <strong>SciEval</strong>! <div align="center"> <img src="assets/SciEvalKit.png" alt="SciEval capability radar" width="90%"> </div> </div>

<img src="assets/icon/why.png" alt="why" height="28" style="vertical-align:middle;" /> Why SciEval?

SciEval is an open‑source evaluation framework and leaderboard aimed at measuring the scientific intelligence of large language and vision–language models.
Although modern frontier models often achieve ~90 on general‑purpose benchmarks, their performance drops sharply on rigorous, domain‑specific scientific tasks—revealing a persistent general‑versus‑scientific gap that motivates the need for SciEval. Its design is shaped by following core ideas:

**Beyond general‑

Comments

More Agents

View all

agentic-ai

Agentsmith

Universal, model-agnostic operating harness for AI agents (Claude, Codex, Gemini, …) — a lean core + work-type profiles assembled by one setup script.

PromptPartner

308

agent-skills

Awesome Gamedev Agent Skills

Game-development Agent Skills for AI coding agents: install once and a master router loads the right skill for your engine and task. 66 original, version-pinned skills (plus a master router) in the portable SKILL.md format that runs across Claude Code, Cursor, Codex, Copilot, Gemini CLI and more, for Godot, Unity, Unreal, web and beyond.

gamedev-skills

303

ai-agents

Agentpet

A desktop pet for macOS & Windows that monitors your AI coding agents (Claude Code, Codex, Cursor, Gemini...) in real time, and grows as you code, feed it tokens, level it up, climb the leaderboard.

ntd4996

279

ai-agent

UltraGameStudio

UltraGameStudio - AI coding agent for game development: engine workflows, gameplay code, and asset generation.

wellingfeng

260

Zero

The coding agent that answers to you, your model, your machine, your rules.

Gitlawb

1,099

agent-bridge

Lucarne

Stop babysitting local AI agents. Just notifications, approve, and resume your Codex,Pi,Grok, or Claude code sessions anywhere. 0-Intrusion mobile control bridge via Telegram/微信/飞书. No hooks, no skills, no MCP.

tuchg

314