Deep Papers

Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.

Technology Science Mathematics Business

Update frequency: every 15 days
Average duration: 36 minutes
Episodes: 55
Years Active: 2023 - 2025

Share to:

Stan Miasnikov, Distinguished Engineer, AI/ML Architecture, Consumer Experience at Verizon Walks Us Through His New Paper

This episode dives into "Category-Theoretic Analysis of Inter-Agent Communication and Mutual Understanding Metric in Recursive Consciousness." The paper presents an extension of the Recursive Conscio…

00:48:11 | Sat 06 Sep 2025

Small Language Models are the Future of Agentic AI

We had the privilege of hosting Peter Belcak – an AI Researcher working on the reliability and efficiency of agentic systems at NVIDIA – who walked us through his new paper making the rounds in AI ci…

00:31:15 | Fri 05 Sep 2025

Watermarking for LLMs and Image Models

In this AI research paper reading, we dive into "A Watermark for Large Language Models" with the paper's author John Kirchenbauer.

This paper is a timely exploration of techniques for embedding invis…

00:42:56 | Wed 30 Jul 2025

Self-Adapting Language Models: Paper Authors Discuss Implications

The authors of the new paper *Self-Adapting Language Models (SEAL)* shared a behind-the-scenes look at their work, motivations, results, and future directions.

The paper introduces a novel method for …

00:31:26 | Tue 08 Jul 2025

The Illusion of Thinking: What the Apple AI Paper Says About LLM Reasoning

This week we discuss The Illusion of Thinking, a new paper from researchers at Apple that challenges today’s evaluation methods and introduces a new benchmark: synthetic puzzles with controllable com…

00:30:35 | Fri 20 Jun 2025

Accurate KV Cache Quantization with Outlier Tokens Tracing

We discuss Accurate KV Cache Quantization with Outlier Tokens Tracing, a deep dive into improving the efficiency of LLM inference. The authors enhance KV Cache quantization, a technique for reducing …

00:25:11 | Wed 04 Jun 2025

Scalable Chain of Thoughts via Elastic Reasoning

In this week's episode, we talk about Elastic Reasoning, a novel framework designed to enhance the efficiency and scalability of large reasoning models by explicitly separating the reasoning process …

00:28:54 | Fri 16 May 2025

Sleep-time Compute: Beyond Inference Scaling at Test-time

What if your LLM could think ahead—preparing answers before questions are even asked?

In this week's paper read, we dive into a groundbreaking new paper from researchers at Letta, introducing sleep-ti…

00:30:24 | Fri 02 May 2025

LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection

For this week's paper read, we dive into our own research.

We wanted to create a replicable, evolving dataset that can keep pace with model training so that you always know you're testing with data yo…

00:27:19 | Fri 18 Apr 2025

AI Benchmark Deep Dive: Gemini 2.5 and Humanity's Last Exam

This week we talk about modern AI benchmarks, taking a close look at Google's recent Gemini 2.5 release and its performance on key evaluations, notably Humanity's Last Exam (HLE). In the session we …

00:26:11 | Fri 04 Apr 2025

Model Context Protocol (MCP)

We cover Anthropic’s groundbreaking Model Context Protocol (MCP). Though it was released in November 2024, we've been seeing a lot of hype around it lately, and thought it was well worth digging into…

00:15:03 | Tue 25 Mar 2025

AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs

This week, we're mixing things up a little bit. Instead of diving deep into a single research paper, we cover the biggest AI developments from the past few weeks.

We break down key announcements, incl…

00:30:23 | Sat 01 Mar 2025

How DeepSeek is Pushing the Boundaries of AI Development

Disclaimer: The podcast and artwork embedded on this page are the property of Arize AI. This content is not affiliated with or endorsed by eachpod.com.

EachPod

EachPod

Deep Papers

Stan Miasnikov, Distinguished Engineer, AI/ML Architecture, Consumer Experience at Verizon Walks Us Through His New Paper

Small Language Models are the Future of Agentic AI

Watermarking for LLMs and Image Models

Self-Adapting Language Models: Paper Authors Discuss Implications

The Illusion of Thinking: What the Apple AI Paper Says About LLM Reasoning

Accurate KV Cache Quantization with Outlier Tokens Tracing

Scalable Chain of Thoughts via Elastic Reasoning

Sleep-time Compute: Beyond Inference Scaling at Test-time

LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection

AI Benchmark Deep Dive: Gemini 2.5 and Humanity's Last Exam

Model Context Protocol (MCP)

AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs

How DeepSeek is Pushing the Boundaries of AI Development

Multiagent Finetuning: A Conversation with Researcher Yilun Du

Training Large Language Models to Reason in Continuous Latent Space

LLMs as Judges: A Comprehensive Survey on LLM-Based Evaluation Methods

Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies

Agent-as-a-Judge: Evaluate Agents with Agents

Introduction to OpenAI's Realtime API

Swarm: OpenAI's Experimental Approach to Multi-Agent Systems