Deep Papers

Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning.

Technology Science Mathematics Business

Update frequency: every 15 days
Average duration: 36 minutes
Episodes: 55
Years Active: 2023 - 2025

Share to:

KV Cache Explained

In this episode, we dive into the intriguing mechanics behind why chat experiences with models like GPT often start slow but then rapidly pick up speed. The key? The KV cache. This essential but unde…

00:04:19 | Thu 24 Oct 2024

The Shrek Sampler: How Entropy-Based Sampling is Revolutionizing LLMs

In this byte-sized podcast, Harrison Chu, Director of Engineering at Arize, breaks down the Shrek Sampler.

This innovative Entropy-Based Sampling technique--nicknamed the 'Shrek Sampler--is transform…

00:03:31 | Wed 16 Oct 2024

Google's NotebookLM and the Future of AI-Generated Audio

This week, Aman Khan and Harrison Chu explore NotebookLM’s unique features, including its ability to generate realistic-sounding podcast episodes from text (but this podcast is very real!). They dive…

00:43:28 | Tue 15 Oct 2024

Exploring OpenAI's o1-preview and o1-mini

OpenAI recently released its o1-preview, which they claim outperforms GPT-4o on a number of benchmarks. These models are designed to think more before answering and handle complex tasks better than t…

00:42:02 | Fri 27 Sep 2024

Breaking Down Reflection Tuning: Enhancing LLM Performance with Self-Learning

A recent announcement on X boasted a tuned model with pretty outstanding performance, and claimed these results were achieved through Reflection Tuning. However, people were unable to reproduce the r…

00:26:54 | Thu 19 Sep 2024

Composable Interventions for Language Models

This week, we're excited to be joined by Kyle O'Brien, Applied Scientist at Microsoft, to discuss his most recent paper, Composable Interventions for Language Models. Kyle and his team present a new …

00:42:35 | Wed 11 Sep 2024

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

This week’s paper presents a comprehensive study of the performance of various LLMs acting as judges. The researchers leverage TriviaQA as a benchmark for assessing objective knowledge reasoning of L…

00:39:05 | Fri 16 Aug 2024

Breaking Down Meta's Llama 3 Herd of Models

Meta just released Llama 3.1 405B–according to them, it’s “the first openly available model that rivals the top AI models when it comes to state-of-the-art capabilities in general knowledge, steerabi…

00:44:40 | Tue 06 Aug 2024

DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines

Chaining language model (LM) calls as composable modules is fueling a new way of programming, but ensuring LMs adhere to important constraints requires heuristic “prompt engineering.”

The paper this …

00:33:57 | Tue 23 Jul 2024

RAFT: Adapting Language Model to Domain Specific RAG

Where adapting LLMs to specialized domains is essential (e.g., recent news, enterprise private documents), we discuss a paper that asks how we adapt pre-trained LLMs for RAG in specialized domains. S…

00:44:01 | Fri 28 Jun 2024

LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic

It’s been an exciting couple weeks for GenAI! Join us as we discuss the latest research from OpenAI and Anthropic. We’re excited to chat about this significant step forward in understanding how LLMs …

00:44:00 | Fri 14 Jun 2024

Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment

We break down the paper--Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment.

Ensuring alignment (aka: making models behave in accordance with human intentions) ha…

00:48:07 | Thu 30 May 2024

Breaking Down EvalGen: Who Validates the Validators?

Disclaimer: The podcast and artwork embedded on this page are the property of Arize AI. This content is not affiliated with or endorsed by eachpod.com.

EachPod

EachPod

Deep Papers

KV Cache Explained

The Shrek Sampler: How Entropy-Based Sampling is Revolutionizing LLMs

Google's NotebookLM and the Future of AI-Generated Audio

Exploring OpenAI's o1-preview and o1-mini

Breaking Down Reflection Tuning: Enhancing LLM Performance with Self-Learning

Composable Interventions for Language Models

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Breaking Down Meta's Llama 3 Herd of Models

DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines

RAFT: Adapting Language Model to Domain Specific RAG

LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic

Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment

Breaking Down EvalGen: Who Validates the Validators?

Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models

Demystifying Chronos: Learning the Language of Time Series

Anthropic Claude 3

Reinforcement Learning in the Era of LLMs

Sora: OpenAI’s Text-to-Video Generation Model

RAG vs Fine-Tuning

Phi-2 Model