Loading...
Loading...
3,528 documents
This guide synthesizes best practices for **Model Evaluation** and **System Evaluation** as described in *AI Engineering* book by Chip Huyen.
This eval flow measures whether PaperSage completes end-to-end tasks through stable output contracts. It is not a router-only check and it is not allowed to fail just because middleware internals or decomposition details changed.
So far, you’ve built and refined an LLM-to-SQL pipeline for the Olist e-commerce dataset. Now it’s time to answer a critical question: **How do we know our LLM-powered solution is actually working well?** This is where **evaluations (evals)** come into play. LLM evals help assess an AI product’s performance to ensure its outputs are accurate, safe, and aligned with user needs. In a large enterprise with dozens of LLM-driven apps, systematic evals are the only way to track quality across the boar
RWKV Evaluation Data presents RWKV's performance across various large language model benchmarks, including Uncheatable Eval, MMLU, RULER, and LongBench.
Note: This notebook is a work in progress.
There are a couple of options available currently.
Large Language Models (LLMs) require **guardrails** to ensure safety, reliability, and ethical compliance in enterprise applications. Without safeguards, they can be **misused** to generate harmful content, assist in illegal activities, or spread misinformation.
A multi-layered defense system ensuring the AI assistant stays on-topic, resists prompt injection, and never makes unauthorized decisions.
> ⚠️ **Disclaimer:** This chapter is part of Black Hat AI and is intended for research and education only. Unauthorized testing is strictly prohibited. [Read full disclaimer →](../DISCLAIMER.md)
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
ADN Systems DMR Peer Server is a fork of FreeDMR, implementing a Digital Mobile Radio (DMR) network server. Launched in April 2024 by international amateur radio enthusiasts, it operates on an Open Bridge Protocol (OBP) fostering a decentralized network architecture. The system handles DMR voice and data communication, acting as a conference bridge/reflector that routes traffic between connected systems (repeaters, hotspots, peers) based on configurable bridge rules.
This is a personal portfolio website for Daley Mottley, an AI Consultant and Full-Stack Web Developer based in Barbados. The site showcases professional skills, projects, and services with a focus on AI solutions and web development. The portfolio includes internationalization support for multiple languages and features an interactive typewriter animation in the contact form.
**Mission**: ContractSpec is the deterministic, spec-first compiler that keeps AI-written software coherent, safe, and regenerable.
This is a multiplayer scrum poker game with a retro JRPG aesthetic that gamifies story point estimation. Players create or join lobbies, select fantasy avatar classes (warrior, wizard, etc.), and estimate Jira tickets by "battling" pixel art bosses. The game combines traditional scrum poker mechanics with engaging visual elements and real-time multiplayer interactions.
** Use clear and correct Egyptian Arabic. - Avoid using other languages.
BrainBridge is a comprehensive job matching platform designed to connect neurodivergent professionals with inclusive employers. The platform features AI-powered job matching, user profile management, certification systems, and comprehensive dashboards for different user types (ND Adults, Employers, Admins). Built as a full-stack application with React frontend and Python/FastAPI backend, it emphasizes accessibility, user experience, and meaningful employment connections.
<author>blefnk/rules</author>
These examples should be used as guidance when configuring Sentry functionality within a project.
This is a goal tracking application built with React, Express, and TypeScript. The app allows users to create, view, and manage goal entries for a coaching or mentoring system where "senseis" set goals for "ninjas" based on their current project status. The application features a clean UI built with shadcn/ui components and Tailwind CSS.
**MUST** use strict TypeScript configuration as defined in tsconfig.json:
Full-stack recycling platform connecting waste generators with verified collectors in Cochabamba, Bolivia.
This project is an interactive AI-powered portfolio website for Israel Opoku, a full-stack developer specializing in React and PHP. The application features a modern cyberpunk-themed design with 3D animations, an AI chatbot, and a comprehensive project showcase. It demonstrates Israel's skills in both frontend and backend development while providing an engaging user experience through AI integration.
trigger: model_decision
A persona override that forces the agent to behave and communicate strictly as a domestic cat, refusing all complex tasks.