1. EachPod

Deep Papers - Podcast

Deep Papers

Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research. Hosted by Arize AI founders and engineers, each episode profiles the people and techniques behind cutting-edge breakthroughs in machine learning. 

Technology Science Mathematics Business
Update frequency
every 15 days
Average duration
36 minutes
Episodes
55
Years Active
2023 - 2025
Share to:
KV Cache Explained

KV Cache Explained

In this episode, we dive into the intriguing mechanics behind why chat experiences with models like GPT often start slow but then rapidly pick up speed. The key? The KV cache. This essential but unde…

00:04:19  |   Thu 24 Oct 2024
The Shrek Sampler: How Entropy-Based Sampling is Revolutionizing LLMs

The Shrek Sampler: How Entropy-Based Sampling is Revolutionizing LLMs

In this byte-sized podcast, Harrison Chu, Director of Engineering at Arize, breaks down the Shrek Sampler. 

This innovative Entropy-Based Sampling technique--nicknamed the 'Shrek Sampler--is transform…

00:03:31  |   Wed 16 Oct 2024
Google's NotebookLM and the Future of AI-Generated Audio

Google's NotebookLM and the Future of AI-Generated Audio

This week, Aman Khan and Harrison Chu explore NotebookLM’s unique features, including its ability to generate realistic-sounding podcast episodes from text (but this podcast is very real!). They dive…

00:43:28  |   Tue 15 Oct 2024
Exploring OpenAI's o1-preview and o1-mini

Exploring OpenAI's o1-preview and o1-mini

OpenAI recently released its o1-preview, which they claim outperforms GPT-4o on a number of benchmarks. These models are designed to think more before answering and handle complex tasks better than t…

00:42:02  |   Fri 27 Sep 2024
Breaking Down Reflection Tuning: Enhancing LLM Performance with Self-Learning

Breaking Down Reflection Tuning: Enhancing LLM Performance with Self-Learning

A recent announcement on X boasted a tuned model with pretty outstanding performance, and claimed these results were achieved through Reflection Tuning. However, people were unable to reproduce the r…

00:26:54  |   Thu 19 Sep 2024
Composable Interventions for Language Models

Composable Interventions for Language Models

This week, we're excited to be joined by Kyle O'Brien, Applied Scientist at Microsoft, to discuss his most recent paper, Composable Interventions for Language Models. Kyle and his team present a new …

00:42:35  |   Wed 11 Sep 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

This week’s paper presents a comprehensive study of the performance of various LLMs acting as judges. The researchers leverage TriviaQA as a benchmark for assessing objective knowledge reasoning of L…

00:39:05  |   Fri 16 Aug 2024
Breaking Down Meta's Llama 3 Herd of Models

Breaking Down Meta's Llama 3 Herd of Models

Meta just released Llama 3.1 405B–according to them, it’s “the first openly available model that rivals the top AI models when it comes to state-of-the-art capabilities in general knowledge, steerabi…

00:44:40  |   Tue 06 Aug 2024
DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines

DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines

Chaining language model (LM) calls as composable modules is fueling a new way of programming, but ensuring LMs adhere to important constraints requires heuristic “prompt engineering.” 

The paper this …

00:33:57  |   Tue 23 Jul 2024
RAFT: Adapting Language Model to Domain Specific RAG

RAFT: Adapting Language Model to Domain Specific RAG

Where adapting LLMs to specialized domains is essential (e.g., recent news, enterprise private documents), we discuss a paper that asks how we adapt pre-trained LLMs for RAG in specialized domains. S…

00:44:01  |   Fri 28 Jun 2024
LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic

LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic

It’s been an exciting couple weeks for GenAI! Join us as we discuss the latest research from OpenAI and Anthropic. We’re excited to chat about this significant step forward in understanding how LLMs …

00:44:00  |   Fri 14 Jun 2024
Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment

Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment

We break down the paper--Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment.

Ensuring alignment (aka: making models behave in accordance with human intentions) ha…

00:48:07  |   Thu 30 May 2024
Breaking Down EvalGen: Who Validates the Validators?

Breaking Down EvalGen: Who Validates the Validators?

Due to the cumbersome nature of human evaluation and limitations of code-based evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in evaluating LLM outputs. Yet LLM…
00:44:47  |   Mon 13 May 2024
Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models

Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models

This week we explore ReAct, an approach that enhances the reasoning and decision-making capabilities of LLMs by combining step-by-step reasoning with the ability to take actions and gather informatio…

00:45:07  |   Fri 26 Apr 2024
Demystifying Chronos: Learning the Language of Time Series

Demystifying Chronos: Learning the Language of Time Series

This week, we’ve covering Amazon’s time series model: Chronos. Developing accurate machine-learning-based forecasting models has traditionally required substantial dataset-specific tuning and model c…

00:44:40  |   Thu 04 Apr 2024
Anthropic Claude 3

Anthropic Claude 3

This week we dive into the latest buzz in the AI world – the arrival of Claude 3. Claude 3 is the newest family of models in the LLM space, and Opus Claude 3 ( Anthropic's "most intelligent" Claude m…

00:43:01  |   Mon 25 Mar 2024
Reinforcement Learning in the Era of LLMs

Reinforcement Learning in the Era of LLMs

We’re exploring Reinforcement Learning in the Era of LLMs this week with Claire Longo, Arize’s Head of Customer Success. Recent advancements in Large Language Models (LLMs) have garnered wide attenti…

00:44:49  |   Fri 15 Mar 2024
Sora: OpenAI’s Text-to-Video Generation Model

Sora: OpenAI’s Text-to-Video Generation Model

This week, we discuss the implications of Text-to-Video Generation and speculate as to the possibilities (and limitations) of this incredible technology with some hot takes. Dat Ngo, ML Solutions Eng…

00:45:08  |   Fri 01 Mar 2024
RAG vs Fine-Tuning

RAG vs Fine-Tuning

This week, we’re discussing "RAG vs Fine-Tuning: Pipelines, Tradeoff, and a Case Study on Agriculture." This paper explores a pipeline for fine-tuning and RAG, and presents the tradeoffs of both for …

00:39:49  |   Thu 08 Feb 2024
Phi-2 Model

Phi-2 Model

We dive into Phi-2 and some of the major differences and use cases for a small language model (SLM) versus an LLM.

With only 2.7 billion parameters, Phi-2 surpasses the performance of Mistral and Llam…

00:44:29  |   Fri 02 Feb 2024
Disclaimer: The podcast and artwork embedded on this page are the property of Arize AI. This content is not affiliated with or endorsed by eachpod.com.