oreilly-agi

Name: oreilly-agi
Author: sinanuozdemir

sinanuozdemir March 14, 2025

19 copies 0 downloads

Explore the evolution of AGI through historical context, reasoning models, and agent systems, while gaining hands-on experience with cutting-edge models like Claude 4, DeepSeek-R1, and OpenAI's o3. Learn to critically evaluate AGI benchmarks, understand their limitations, and identify where current models excel or struggle in reasoning tasks.

Agent Definition

![oreilly-logo](images/oreilly.png)

# Artificial General Intelligence (AGI) Demystified


This repository contains code for live session and video for the [O'Reilly Course on Artificial General Intelligence (AGI) Demystified](https://www.oreilly.com/live-events/artificial-general-intelligence-agi-demystified/0642572174033)

This course offers an exploration of the current approaches toward Artificial General Intelligence (AGI), focusing on state-of-the-art reasoning models and agent architectures. Participants will learn about the evolution of AGI research, understand key benchmarks used to measure progress, and gain practical knowledge in working with advanced models like Claude 3.7, DeepSeek-R1, and OpenAI's o3. Through hands-on exercises and case studies, attendees will develop the skills needed to evaluate these models' capabilities, understand their limitations, and apply them effectively to complex tasks.

## Notebooks

- **Using Reasoning Models**

	- [An Introduction to Reasoning Models](notebooks/intro_to_reasoning_models.ipynb) - Using 6 reasoning models across 5 providers
	
	- [Short Term Memory with OpenAI Agents](notebooks/OpenAI%20Agents.ipynb) - OpenAI Agents doesn't implement short term memory by default, this code will add the concept to the agent

	- [Computer Use with reasoning models](notebooks/computer_use_reasoning.ipynb) - Letting a reasoning model control our laptop (caution advised when running this code)

	
- **Benchmarking**

	- [Benchmarking Reasoning Models](notebooks/benchmarking_reasoning_models.ipynb) - Running questions from [Humanity's Last Exam](https://huggingface.co/datasets/cais/hle/discussions) on reasoning models

	- [Benchmarking Llama 3.2 Instruct on MMLU and Embedders on MTEB](https://colab.research.google.com/drive/1zDCqXc7vHoZilHVe3y2lYyTmSUSe6bh3?usp=sharingb) 
	
	
		- [Follow-up Evaluating Llama 3.2 non-instruct on MMLU](https://colab.research.google.com/drive/1aMy19Ikyody9CGyn42K3E_DQwLScL0Ek?usp=sharing)

		- [Evalua

Comments

More Agents

View all

hybrid-model-workflow

HAL 分层混合模型工作流 — 强模型(Claude)负责理解/拆解/验收，低成本模型(DeepSeek)负责检索/提取/清洗。Hermes Agent skill。

ph4ble

Dynamic-Review-Agent

An LLM agent fine-tuned on DeepSeek for spaced repetition, dynamically integrating knowledge points based on the Ebbinghaus forgetting curve.

1838177

StellarOS-Watch

基于 STM32F103 构建的端到端 AI 智能手表生态。自研“零重定位”原生机器码动态加载引擎与页面栈式 UI 框架；集成生产级 OTA 回滚保护机制与高带宽（921600 baud）串口协议栈。通过 Node.js 中继实现 DeepSeek AI 语义控制及 ASRPRO 语音全双工交互，是一个集成了分布式计算、现代存储管理与 AI Agent 的嵌入式全栈工程。

chenshuang888