AXRP - the AI X-risk Research Podcast

AXRP (pronounced axe-urp) is the AI X-risk Research Podcast where I, Daniel Filan, have conversations with researchers about their papers. We discuss the paper, and hopefully get a sense of why it's been written and how it might reduce the risk of AI causing an existential catastrophe: that is, permanently and drastically curtailing humanity's future potential. You can visit the website and read transcripts at axrp.net.

Science Ai Technology

Update frequency: every 24 days
Average duration: 96 minutes
Episodes: 59
Years Active: 2020 - 2025

Share to:

18 - Concept Extrapolation with Stuart Armstrong

Concept extrapolation is the idea of taking concepts an AI has about the world - say, "mass" or "does this picture contain a hot dog" - and extending them sensibly to situations where things are diff…

01:46:19 | Sat 03 Sep 2022

17 - Training for Very High Reliability with Daniel Ziegler

Sometimes, people talk about making AI systems safe by taking examples where they fail and training them to do well on those. But how can we actually do this well, especially when we can't use a comp…

01:00:59 | Sun 21 Aug 2022

16 - Preparing for Debate AI with Geoffrey Irving

Many people in the AI alignment space have heard of AI safety via debate - check out AXRP episode 6 (axrp.net/episode/2021/04/08/episode-6-debate-beth-barnes.html) if you need a primer. But how do we…

01:04:49 | Fri 01 Jul 2022

15 - Natural Abstractions with John Wentworth

Why does anybody care about natural abstractions? Do they somehow relate to math, or value learning? How do E. coli bacteria find sources of sugar? All these questions and more will be answered in th…

01:36:30 | Mon 23 May 2022

14 - Infra-Bayesian Physicalism with Vanessa Kosoy

Late last year, Vanessa Kosoy and Alexander Appel published some research under the heading of "Infra-Bayesian physicalism". But wait - what was infra-Bayesianism again? Why should we care? And what …

01:47:31 | Tue 05 Apr 2022

13 - First Principles of AGI Safety with Richard Ngo

How should we think about artificial general intelligence (AGI), and the risks it might pose? What constraints exist on technical solutions to the problem of aligning superhuman AI systems with human…

01:33:53 | Thu 31 Mar 2022

12 - AI Existential Risk with Paul Christiano

Why would advanced AI systems pose an existential risk, and what would it look like to develop safer systems? In this episode, I interview Paul Christiano about his views of how AI could be so danger…

02:49:36 | Thu 02 Dec 2021

11 - Attainable Utility and Power with Alex Turner

Many scary stories about AI involve an AI system deceiving and subjugating humans in order to gain the ability to achieve its goals without us stopping it. This episode's guest, Alex Turner, will tel…

01:27:36 | Sat 25 Sep 2021

10 - AI's Future and Impacts with Katja Grace

When going about trying to ensure that AI does not cause an existential catastrophe, it's likely important to understand how AI will develop in the future, and why exactly it might or might not cause…

02:02:58 | Fri 23 Jul 2021

9 - Finite Factored Sets with Scott Garrabrant

Being an agent can get loopy quickly. For instance, imagine that we're playing chess and I'm trying to decide what move to make. Your next move influences the outcome of the game, and my guess of tha…

01:38:59 | Thu 24 Jun 2021

8 - Assistance Games with Dylan Hadfield-Menell

How should we think about the technical problem of building smarter-than-human AI that does what we want? When and how should AI systems defer to us? Should they have their own goals, and how should …

02:23:17 | Tue 08 Jun 2021

7.5 - Forecasting Transformative AI from Biological Anchors with Ajeya Cotra

If you want to shape the development and forecast the consequences of powerful AI technology, it's important to know when it might appear. In this episode, I talk to Ajeya Cotra about her draft repor…

00:01:03 | Fri 28 May 2021

7 - Side Effects with Victoria Krakovna

One way of thinking about how AI might pose an existential threat is by taking drastic actions to maximize its achievement of some objective function, such as taking control of the power supply or th…

01:19:29 | Fri 14 May 2021

6 - Debate and Imitative Generalization with Beth Barnes

One proposal to train AIs that can be useful is to have ML models debate each other about the answer to a human-provided question, where the human judges which side has won. In this episode, I talk w…

01:58:48 | Thu 08 Apr 2021

5 - Infra-Bayesianism with Vanessa Kosoy

The theory of sequential decision-making has a problem: how can we deal with situations where we have some hypotheses about the environment we're acting in, but its exact form might be outside the ra…

01:23:51 | Wed 10 Mar 2021

4 - Risks from Learned Optimization with Evan Hubinger

In machine learning, typically optimization is done to produce a model that performs well according to some metric. Today's episode features Evan Hubinger talking about what happens when the learned …

02:13:32 | Wed 17 Feb 2021

3 - Negotiable Reinforcement Learning with Andrew Critch

In this episode, I talk with Andrew Critch about negotiable reinforcement learning: what happens when two people (or organizations, or what have you) who have different beliefs and preferences jointl…

00:58:14 | Fri 11 Dec 2020

2 - Learning Human Biases with Rohin Shah

One approach to creating useful AI systems is to watch humans doing a task, infer what they're trying to do, and then try to do that well. The simplest way to infer what the humans are trying to do i…

01:08:51 | Fri 11 Dec 2020

1 - Adversarial Policies with Adam Gleave

In this episode, Adam Gleave and I talk about adversarial policies. Basically, in current reinforcement learning, people train agents that act in some kind of environment, sometimes an environment th…

00:58:41 | Fri 11 Dec 2020

Disclaimer: The podcast and artwork embedded on this page are the property of Daniel Filan. This content is not affiliated with or endorsed by eachpod.com.

EachPod