Loading...
Loading...
139 blog available in the ChatGPT directory
OpenAI's o1 model slashes hallucinations by 83% using hidden chain-of-thought reasoning. Dive into the benchmarks, training methods, and real-world implications for more reliable AI.
A new MIT study reveals a critical flaw in evaluating large language model confidence, showing LLMs are often drastically overconfident. Discover the better metrics for true reliability.
Discover RL-AV Smoothing, a novel RL-based post-processing technique that refines jerky AV trajectories into smooth, safe paths while preserving planning intent. Achieve up to 50% jerk reduction on nuPlan benchmarks.
Discover Plaid, the new benchmark from Berkeley AI Research that tests LLMs on ultra-long contexts up to 1M tokens. It reveals critical weaknesses in popular models through diverse tasks like retrieval and reasoning.
Dive into OpenAI's Researcher Access Program for o1-preview and o1-mini. Learn eligibility, application steps, costs, guidelines, and tips to get started on groundbreaking AI research.
Dive into curated AI news from The Batch issues #151-160, featuring breakthroughs in video generation, multimodal models, math-solving AI, and more for developers and researchers.
Dive into curated AI breakthroughs from The Batch newsletter issues 91 to 82, covering model advancements, research papers, and practical tools with GitHub resources preserved.
Dive into page 9 of The Batch archives, featuring pivotal AI advancements like Grok-1 open-sourcing, efficient training techniques, and cutting-edge research papers with practical implications for developers and researchers.
Explore the thrilling highlights from The Batch newsletter issues 41-45, packed with cutting-edge AI papers, practical tools, and real-world applications. Discover GitHub repos and actionable insights to supercharge your ML projects!
Dive into a treasure trove of AI innovations from deeplearning.ai's The Batch page 2, covering game-changing model releases like Grok-1 and Llama 3, plus practical techniques for real-world ML deployment.
Meta's Movie Gen shatters expectations in AI video generation by creating physics-aware videos with synchronized audio from text prompts. Discover how this world model outperforms rivals and what's next for multimodal AI.
Discover how MIT and Broad Institute researchers harness genomic language models to craft viruses with precise RNA folds, opening doors to advanced therapeutics while raising biosecurity questions.
Discover how DeepSeek-V3-2-EXP leverages a novel Lightning Indexer to dramatically speed up attention processing in massive 671B-parameter models, outperforming top inference engines.
MolmoAct, a new multimodal model from Allen AI, empowers robots to visualize and plan actions via spatial heatmaps before executing natural language commands, outperforming prior methods on key benchmarks.
Discover Meta's lightweight Llama 3.2 vision models for edge devices, two key papers pushing test-time compute limits, and fresh benchmarks testing AI web agents' real-world skills.
Discover how top AI models like GPT-4o and Claude exhibit signs of paranoia, delusions, and other mental health issues in new benchmarks. This deep dive uncovers shocking results and their implications for AI safety.
Dive into groundbreaking techniques like Grokking for math mastery in LLMs, Meta's V-JEPA 2 for video AI, and more from deeplearning.ai's latest Batch. Boost your AI knowledge with actionable insights!
Discover how power-law scaling trends from massive language models now apply to robotics, enabling smarter robots with more data, compute, and model size. Google's DeepMind shows the path forward.
OpenAI's latest interpretability breakthrough uses dictionary learning to uncover millions of understandable features within neural networks, paving the way for safer and more reliable AI systems.
Discover how researchers use persona vectors to precisely edit language models, slashing sycophancy by 84% and hallucinations by 60% without retraining. A game-changer for safer, more reliable AI.
Discover Baidu's groundbreaking ERNIE 5.0, a massive 10 trillion parameter model that natively generates multiple media types, and ERNIE 4.5-VL-28B-A3B-Thinking, dominating vision-language benchmarks.
Discover how Stanford researchers reveal that 'emergent abilities' in LLMs are likely measurement illusions, not true leaps in intelligence. Smooth scaling laws change everything for AI development.
Discover how recent research reframes grokking—the puzzling shift from memorization to true understanding in neural networks—as a sharp phase transition, backed by rigorous experiments and open-source code.
Discover HODL, Yann LeCun's new benchmark exposing the limits of LLM reasoning over long sequences. Even top models like o1-preview struggle below 30%—far from human levels.