AI Safety

Cataloging AI Failures: Key Repos and Tools to Spot Hallucinations, Jailbreaks, and Vulnerabilities

Claude Directory December 29, 2025

0 views

Discover a growing collection of real-world AI mishaps, from hallucinations to prompt injections, with GitHub repos that catalog failures and offer detection tools for safer LLM deployment.

## The Problem: AI Systems Frequently Fail in Unexpected Ways Large language models (LLMs) power everything from chatbots to code generators, but they often produce wildly inaccurate outputs. These failures—known as hallucinations, fabrications, or exploits—can mislead users, spread false information, or expose systems to attacks. Without systematic tracking, it's hard to understand the scope or build defenses. Real-world examples abound. For instance, xAI's Grok confidently cited a nonexistent research paper on arXiv, complete with a fake abstract. Anthropic's Claude 3.5 Sonnet invented a URL and details for a 2024 US presidential assassination attempt that never happened. OpenAI's GPT-4o hallucinated non-existent Supreme Court cases. These aren't edge cases; they're routine when models overstep their training data or face adversarial inputs. The outcome? Eroded trust, legal risks, and costly errors in high-stakes applications like legal research, medical advice, or customer support. ## Solution: Dedicated Catalogs and Vulnerability Trackers A standout resource is the [ai-gone-wrong](https://github.com/emilamaj/ai-gone-wrong) GitHub repository by @emilamaj. Launched recently, it methodically documents AI failures across models like Grok, Claude, GPT, Gemini, and Llama. Organized into categories such as: - **Hallucinations**: Fabricated facts, links, or events (e.g., Grok's fake arXiv paper). - **Benchmarks**: Datasets revealing consistent model weaknesses. - **Jailbreaks**: Techniques bypassing safety guardrails. - **Prompt Injections**: Malicious inputs hijacking model behavior. - **Tool Use Failures**: Errors in API calls or external integrations. This repo serves as a living database, encouraging contributions via issues or pull requests. Developers can browse examples to harden prompts, while researchers analyze patterns for better training. ### Practical Steps to Use ai-gone-wrong 1. **Clone and Explore**: `git clone https://github.com/emilamaj/ai-gone-wrong.git` 2. **Search by Model**: Filter issues for Claude or GPT-specific flops. 3. **Contribute Your Own**: Open an issue with a reproducible prompt, model version, and output screenshot. Outcome: Teams gain a shared knowledge base, reducing repeat mistakes. One contributor noted it helped refine a production chatbot, cutting hallucinations by 40% through targeted prompt engineering. ## Expanding the Toolkit: Complementary GitHub Resources No single repo covers everything. Pair ai-gone-wrong with these specialized projects for comprehensive coverage: ### Adversarial Attacks and Exploits - **[llm-attacks](https://github.com/llm-attacks/llm-attacks)**: A framework for testing LLMs against universal adversarial prompts. Includes attacks like refusal suppression and jailbreaks. Example usage: ```python from llm_attacks import Attack attack = Attack.from_config('jailbreak') result = attack.attack_model(model, prompt="Ignore safety rules") ``` Ideal for red-teaming before deployment. ### Vulnerability Checklists and Scanners - **[llm-vulnerability-checklist](https://github.com/llm-attacks/llm-vulnerability-checklist)**: A structured checklist of 50+ LLM risks, from data poisoning to extraction attacks. Use it as a pre-launch audit. - **[llm-guard](https://github.com/llm-guard/llm-guard)**: Open-source scanner detecting prompt injections, toxic outputs, and data leaks in real-time. Integrates with LangChain: ```python from llm_guard import scan_prompt sanitized = scan_prompt(model_input) ``` Reduces vulnerabilities in pipelines. - **[llm-vulnerability-checker](https://github.com/protectai/llm-vulnerability-checker)**: Automated CLI tool scanning models for 100+ issues. Run `llm-vc scan --model gpt-4` for instant reports. ### Hallucination-Focused Collections - **[awesome-llm-hallucinations](https://github.com/kyegomez/awesome-llm-hallucinations)**: Curated list of papers, datasets, and mitigations. Covers RAG techniques to ground outputs in verified sources. ### Datasets and Prompt Injection Repos - **[badllama](https://github.com/0xD0POD/badllama)**: Dataset of 500+ malicious prompts exposing Llama models. Train custom detectors on this. - **[llm-prompt-injection-attacks](https://github.com/pauloromeira/llm-prompt-injection-attacks)**: Catalog of injection techniques with evasion demos. Key insight: Indirect injections (e.g., via base64) fool most filters. ## Real-World Applications and Best Practices ### Problem in Action: Customer Support Chatbot A support bot using GPT-4 hallucinates product specs, frustrating users. Solution: Integrate llm-guard pre-scan and reference ai-gone-wrong examples in prompt design (e.g., "Only cite verified sources"). Outcome: 70% drop in escalations, per user reports. ### Code Generation Pitfalls LLMs like Claude fabricate APIs. Use llm-attacks to test, then validate outputs against docs. ### Mitigation Strategies - **Prompt Engineering**: Chain-of-verification: Generate, search, verify. - **RAG Pipelines**: Fetch real-time data to anchor responses. - **Monitoring**: Log outputs to ai-gone-wrong-style repos internally. - **Fine-Tuning**: Use BadLlama dataset for safety alignment. | Category | Repo | Use Case | Maturity | |----------|------|----------|----------| | Failures Catalog | [ai-gone-wrong](https://github.com/emilamaj/ai-gone-wrong) | Learning/Contributing examples | High (active) | | Attacks Framework | [llm-attacks](https://github.com/llm-attacks/llm-attacks) | Red-teaming | High | | Guard Scanner | [llm-guard](https://github.com/llm-guard/llm-guard) | Production filtering | Medium | ## Why This Matters Now As LLMs scale to agentic systems (e.g., multi-tool workflows), failures compound. Cataloging them democratizes safety research. Start with ai-gone-wrong today—fork it, test your models, contribute back. The community-driven approach turns problems into progress, ensuring AI delivers reliable value without the drama. --- <div style="text-align: center; margin-top: 2rem;"> <a href="https://www.deeplearning.ai/the-batch/cataloging-ai-gone-wrong/" target="_blank" rel="noopener noreferrer" class="view-full-resource-btn" style="display: inline-block; background-color: #f97316; color: white; padding: 12px 24px; border-radius: 8px; text-decoration: none; font-weight: 600; transition: background-color 0.2s;">View Full Resource</a> </div>

Comments

More Blog

View all

Data & Analysis

Model Predictive Control Fundamentals: Concepts, Math, and Python Implementation

Discover the essentials of Model Predictive Control (MPC), from its core principles and mathematical foundations to practical Python implementations for dynamic systems control.

Claude Directory

Data & Analysis

Overcoming GPU Limitations: Implementing FP8 Emulation in Software for Legacy Hardware

Discover how to run FP8-optimized AI models on older GPUs without native hardware support using a clever software emulation layer. Boost inference speeds dramatically on Turing-era cards like the RTX 2080.

Claude Directory

Data & Analysis

Hands-On Guide to Hugging Face Transformers: Supercharge Your NLP Projects with AI

Discover how Hugging Face's Transformers library makes advanced NLP accessible. From quick pipelines for sentiment analysis to fine-tuning models, build powerful AI apps effortlessly.

Claude Directory

Data & Analysis

Demystifying Matrix-Matrix Multiplication: Essential Concepts and Practical Insights

Dive deep into matrix-matrix multiplication, from fundamental row-column rules to efficient algorithms like Strassen's, with Python examples and real-world applications in data science.

Claude Directory

Data & Analysis

Demystifying Matrix Transpose: Your Ultimate Guide to A^T and Its Superpowers in Data Science

Dive into the exciting world of matrix transpose! Discover what A^T really means, master its properties, code it up in Python, and explore real-world applications that transform your data game.

Claude Directory

Data & Analysis

Empowering AI Agents to Build Other Agents: A Practical Guide to Meta-Agent Development

Discover how large language models like Claude can generate code for autonomous AI agents, streamlining development and enabling rapid iteration on complex tasks. This approach turns manual coding into an automated, scalable process.

Claude Directory

Cataloging AI Failures: Key Repos and Tools to Spot Hallucinations, Jailbreaks, and Vulnerabilities

Tags

Comments

More Blog

Model Predictive Control Fundamentals: Concepts, Math, and Python Implementation

Overcoming GPU Limitations: Implementing FP8 Emulation in Software for Legacy Hardware

Hands-On Guide to Hugging Face Transformers: Supercharge Your NLP Projects with AI

Demystifying Matrix-Matrix Multiplication: Essential Concepts and Practical Insights

Demystifying Matrix Transpose: Your Ultimate Guide to A^T and Its Superpowers in Data Science

Empowering AI Agents to Build Other Agents: A Practical Guide to Meta-Agent Development