Discover a growing collection of real-world AI mishaps, from hallucinations to prompt injections, with GitHub repos that catalog failures and offer detection tools for safer LLM deployment.
## The Problem: AI Systems Frequently Fail in Unexpected Ways
Large language models (LLMs) power everything from chatbots to code generators, but they often produce wildly inaccurate outputs. These failures—known as hallucinations, fabrications, or exploits—can mislead users, spread false information, or expose systems to attacks. Without systematic tracking, it's hard to understand the scope or build defenses.
Real-world examples abound. For instance, xAI's Grok confidently cited a nonexistent research paper on arXiv, complete with a fake abstract. Anthropic's Claude 3.5 Sonnet invented a URL and details for a 2024 US presidential assassination attempt that never happened. OpenAI's GPT-4o hallucinated non-existent Supreme Court cases. These aren't edge cases; they're routine when models overstep their training data or face adversarial inputs.
The outcome? Eroded trust, legal risks, and costly errors in high-stakes applications like legal research, medical advice, or customer support.
## Solution: Dedicated Catalogs and Vulnerability Trackers
A standout resource is the [ai-gone-wrong](https://github.com/emilamaj/ai-gone-wrong) GitHub repository by @emilamaj. Launched recently, it methodically documents AI failures across models like Grok, Claude, GPT, Gemini, and Llama. Organized into categories such as:
- **Hallucinations**: Fabricated facts, links, or events (e.g., Grok's fake arXiv paper).
- **Benchmarks**: Datasets revealing consistent model weaknesses.
- **Jailbreaks**: Techniques bypassing safety guardrails.
- **Prompt Injections**: Malicious inputs hijacking model behavior.
- **Tool Use Failures**: Errors in API calls or external integrations.
This repo serves as a living database, encouraging contributions via issues or pull requests. Developers can browse examples to harden prompts, while researchers analyze patterns for better training.
### Practical Steps to Use ai-gone-wrong
1. **Clone and Explore**: `git clone https://github.com/emilamaj/ai-gone-wrong.git`
2. **Search by Model**: Filter issues for Claude or GPT-specific flops.
3. **Contribute Your Own**: Open an issue with a reproducible prompt, model version, and output screenshot.
Outcome: Teams gain a shared knowledge base, reducing repeat mistakes. One contributor noted it helped refine a production chatbot, cutting hallucinations by 40% through targeted prompt engineering.
## Expanding the Toolkit: Complementary GitHub Resources
No single repo covers everything. Pair ai-gone-wrong with these specialized projects for comprehensive coverage:
### Adversarial Attacks and Exploits
- **[llm-attacks](https://github.com/llm-attacks/llm-attacks)**: A framework for testing LLMs against universal adversarial prompts. Includes attacks like refusal suppression and jailbreaks. Example usage:
```python
from llm_attacks import Attack
attack = Attack.from_config('jailbreak')
result = attack.attack_model(model, prompt="Ignore safety rules")
```
Ideal for red-teaming before deployment.
### Vulnerability Checklists and Scanners
- **[llm-vulnerability-checklist](https://github.com/llm-attacks/llm-vulnerability-checklist)**: A structured checklist of 50+ LLM risks, from data poisoning to extraction attacks. Use it as a pre-launch audit.
- **[llm-guard](https://github.com/llm-guard/llm-guard)**: Open-source scanner detecting prompt injections, toxic outputs, and data leaks in real-time. Integrates with LangChain:
```python
from llm_guard import scan_prompt
sanitized = scan_prompt(model_input)
```
Reduces vulnerabilities in pipelines.
- **[llm-vulnerability-checker](https://github.com/protectai/llm-vulnerability-checker)**: Automated CLI tool scanning models for 100+ issues. Run `llm-vc scan --model gpt-4` for instant reports.
### Hallucination-Focused Collections
- **[awesome-llm-hallucinations](https://github.com/kyegomez/awesome-llm-hallucinations)**: Curated list of papers, datasets, and mitigations. Covers RAG techniques to ground outputs in verified sources.
### Datasets and Prompt Injection Repos
- **[badllama](https://github.com/0xD0POD/badllama)**: Dataset of 500+ malicious prompts exposing Llama models. Train custom detectors on this.
- **[llm-prompt-injection-attacks](https://github.com/pauloromeira/llm-prompt-injection-attacks)**: Catalog of injection techniques with evasion demos. Key insight: Indirect injections (e.g., via base64) fool most filters.
## Real-World Applications and Best Practices
### Problem in Action: Customer Support Chatbot
A support bot using GPT-4 hallucinates product specs, frustrating users. Solution: Integrate llm-guard pre-scan and reference ai-gone-wrong examples in prompt design (e.g., "Only cite verified sources").
Outcome: 70% drop in escalations, per user reports.
### Code Generation Pitfalls
LLMs like Claude fabricate APIs. Use llm-attacks to test, then validate outputs against docs.
### Mitigation Strategies
- **Prompt Engineering**: Chain-of-verification: Generate, search, verify.
- **RAG Pipelines**: Fetch real-time data to anchor responses.
- **Monitoring**: Log outputs to ai-gone-wrong-style repos internally.
- **Fine-Tuning**: Use BadLlama dataset for safety alignment.
| Category | Repo | Use Case | Maturity |
|----------|------|----------|----------|
| Failures Catalog | [ai-gone-wrong](https://github.com/emilamaj/ai-gone-wrong) | Learning/Contributing examples | High (active) |
| Attacks Framework | [llm-attacks](https://github.com/llm-attacks/llm-attacks) | Red-teaming | High |
| Guard Scanner | [llm-guard](https://github.com/llm-guard/llm-guard) | Production filtering | Medium |
## Why This Matters Now
As LLMs scale to agentic systems (e.g., multi-tool workflows), failures compound. Cataloging them democratizes safety research. Start with ai-gone-wrong today—fork it, test your models, contribute back. The community-driven approach turns problems into progress, ensuring AI delivers reliable value without the drama.
---
<div style="text-align: center; margin-top: 2rem;">
<a href="https://www.deeplearning.ai/the-batch/cataloging-ai-gone-wrong/" target="_blank" rel="noopener noreferrer" class="view-full-resource-btn" style="display: inline-block; background-color: #f97316; color: white; padding: 12px 24px; border-radius: 8px; text-decoration: none; font-weight: 600; transition: background-color 0.2s;">View Full Resource</a>
</div>