All Documents

Judging Rubric

**AI for Social Good Hackathon – SUST 2026**

aievalworkflow

rudra496

The goal of a Qualifying Exam ("qual") is for a student to *effectively demonstrate that they have the knowledge and skills that will be needed to conduct meaningful research in their chosen subfield*. There are a number of key phrases in this sentence:

aieval

adsarwate

EGG Rubric: Corporate Sustainability Evaluation Framework

The **EGG (Environmental, Governance & Goals) Rubric** is a comprehensive evaluation framework for assessing corporate sustainability performance across five critical sustainability themes. This rubric employs a multi-dimensional scoring approach that evaluates both the **quantity** and **quality** of corporate commitments, as well as their **specificity** and **temporal evolution**.

aieval

sc22112350-creator

RAG.md

Code Span Semantic Chunking Executive Summary

LLMC’s retrieval system must balance context relevance with token limitations, especially for large code

vmlinuzx

Instructions for Claude Code: n8n Meal Feedback LLM Evaluation Workflow

Create a plan to build an n8n workflow that evaluates multiple LLM prompts for generating meal feedback using a **thinking model to generate ground truth** for comparison.

B-vR

GLOSSARY.md

Glossary

*[Deutsche Version](GLOSSARY_DE.md)*

hanasobi

Agent Evaluation Reference Guide

Complete documentation for the `agent-eval` CLI, metrics, data formats, and customization.

aiagenteval

danielazamorah

RAG.md

RAG Evaluation

title: RAG Evaluation

nitin27may

GOLDEN_SET.md

Thesis Falsifier

A tool to aid researchers in assessing whether research papers adhere to scientific best practices. This application uses AI to automatically generate falsification forms, helping researchers verify the scientific robustness of their work across disciplines including social sciences and natural sciences.

fobert789

RAG Evaluation Guide

This guide explains how to evaluate the RAG (Retrieval-Augmented Generation) performance of the Clarity and Rigor agents using different retriever configurations.

aiagentrag

cfcarnabiitkgp

SKILL.md

Prompt Testing Skill

description: Comprehensive prompt testing and LLM output evaluation skill covering hallucination detection, response quality scoring, regression testing for prompts, A/B testing, and building evaluation pipelines for AI-powered applications.

aiagentllm

PramodDutta

PRD-010 — Evaluation Framework

title: Evaluation Framework

aiagentllm

HardMax71

GOLDEN_SET.md

[[Retrieval Augmented Generation|RAG]]: Trade-offs & Evaluation Strategy

* **Rapid Time to Market:** Easier to implement than fine-tuning a model from scratch.

TanKaizokuO

Using Performance Metrics to Evaluate RAG Systems

title: "Data-Driven RAG Evaluation: Testing Qdrant Apps with Relari AI"

qdrant

Criteria 1: Quality of Exploratory Data Analysis (20%) [20]

module_title: Data Science and Machine Learning

iftikharafridi

rawrubric

This rubric defines a **standardised metric** for evaluating how well a software repository implements core **kernel** and **operating‑system (OS)** primitives. It is based on the function manifest and status report from the Echo.Kern project and draws on general operating‑system principles ([Wikipedia: Kernel](https://en.wikipedia.org/wiki/Kernel_(operating_system)#:~:text=operating%20system%20%20that%20always,for%20the%20central%20processing%20unit)). The goal is to provide a repeatable method

aieval

9cog