AXRP - the AI X-risk Research Podcast

AXRP (pronounced axe-urp) is the AI X-risk Research Podcast where I, Daniel Filan, have conversations with researchers about their papers. We discuss the paper, and hopefully get a sense of why it's been written and how it might reduce the risk of AI causing an existential catastrophe: that is, permanently and drastically curtailing humanity's future potential. You can visit the website and read transcripts at axrp.net.

Science Ai Technology

Update frequency: every 24 days
Average duration: 96 minutes
Episodes: 59
Years Active: 2020 - 2025

Share to:

35 - Peter Hase on LLM Beliefs and Easy-to-Hard Generalization

How do we figure out what large language models believe? In fact, do they even have beliefs? Do those beliefs have locations, and if so, can we edit those locations to change the beliefs? Also, how a…

02:17:24 | Sat 24 Aug 2024

34 - AI Evaluations with Beth Barnes

How can we figure out if AIs are capable enough to pose a threat to humans? When should we make a big effort to mitigate risks of catastrophic AI misbehaviour? In this episode, I chat with Beth Barne…

02:14:02 | Sun 28 Jul 2024

33 - RLHF Problems with Scott Emmons

Reinforcement Learning from Human Feedback, or RLHF, is one of the main ways that makers of large language models make them 'aligned'. But people have long noted that there are difficulties with this…

01:41:24 | Wed 12 Jun 2024

32 - Understanding Agency with Jan Kulveit

What's the difference between a large language model and the human brain? And what's wrong with our theories of agency? In this episode, I chat about these questions with Jan Kulveit, who leads the A…

02:22:29 | Thu 30 May 2024

31 - Singular Learning Theory with Daniel Murfet

What's going on with deep learning? What sorts of models get learned, and what are the learning dynamics? Singular learning theory is a theory of Bayesian statistics broad enough in scope to encompas…

02:32:07 | Tue 07 May 2024

30 - AI Security with Jeffrey Ladish

Top labs use various forms of "safety training" on models before their release to make sure they don't do nasty stuff - but how robust is that? How can we ensure that the weights of powerful AIs don'…

02:15:44 | Tue 30 Apr 2024

29 - Science of Deep Learning with Vikrant Varma

In 2022, it was announced that a fairly simple method can be used to extract the true beliefs of a language model on any given topic, without having to actually understand the topic at hand. Earlier,…

02:13:46 | Thu 25 Apr 2024

28 - Suing Labs for AI Risk with Gabriel Weil

How should the law govern AI? Those concerned about existential risks often push either for bans or for regulations meant to ensure that AI is developed safely - but another approach is possible. In …

01:57:30 | Wed 17 Apr 2024

27 - AI Control with Buck Shlegeris and Ryan Greenblatt

A lot of work to prevent AI existential risk takes the form of ensuring that AIs don't want to cause harm or take over the world---or in other words, ensuring that they're aligned. In this episode, I…

02:56:05 | Thu 11 Apr 2024

26 - AI Governance with Elizabeth Seger

The events of this year have highlighted important questions about the governance of artificial intelligence. For instance, what does it mean to democratize AI? And how should we balance benefits and…

01:57:13 | Sun 26 Nov 2023

25 - Cooperative AI with Caspar Oesterheld

Imagine a world where there are many powerful AI systems, working at cross purposes. You could suppose that different governments use AIs to manage their militaries, or simply that many powerful AIs …

03:02:09 | Tue 03 Oct 2023

24 - Superalignment with Jan Leike

Recently, OpenAI made a splash by announcing a new "Superalignment" team. Lead by Jan Leike and Ilya Sutskever, the team would consist of top researchers, attempting to solve alignment for superintel…

02:08:29 | Thu 27 Jul 2023

23 - Mechanistic Anomaly Detection with Mark Xu

Is there some way we can detect bad behaviour in our AI system without having to know exactly what it looks like? In this episode, I speak with Mark Xu about mechanistic anomaly detection: a research…

02:05:52 | Thu 27 Jul 2023

Survey, store closing, Patreon

Very brief survey: bit.ly/axrpsurvey2023

Store is closing in a week! Link: store.axrp.net/

Patreon: patreon.com/axrpodcast

Ko-fi: ko-fi.com/axrpodcast

00:04:26 | Wed 28 Jun 2023

22 - Shard Theory with Quintin Pope

What can we learn about advanced deep learning systems by understanding how humans learn and form values over their lifetimes? Will superhuman AI look like ruthless coherent utility optimization, or …

03:28:21 | Thu 15 Jun 2023

21 - Interpretability for Engineers with Stephen Casper

Lots of people in the field of machine learning study 'interpretability', developing tools that they say give us useful information about neural networks. But how do we know if meaningful progress is…

01:56:02 | Tue 02 May 2023

20 - 'Reform' AI Alignment with Scott Aaronson

How should we scientifically think about the impact of AI on human civilization, and whether or not it will doom us all? In this episode, I speak with Scott Aaronson about his views on how to make pr…

02:27:35 | Wed 12 Apr 2023

Store, Patreon, Video

Store: https://store.axrp.net/

Patreon: https://www.patreon.com/axrpodcast

Ko-fi: https://ko-fi.com/axrpodcast

Video: https://www.youtube.com/watch?v=kmPFjpEibu0

00:02:39 | Tue 07 Feb 2023

19 - Mechanistic Interpretability with Neel Nanda

How good are we at understanding the internal computation of advanced machine learning models, and do we have a hope at getting better? In this episode, Neel Nanda talks about the sub-field of mechan…

03:52:47 | Sat 04 Feb 2023

New podcast - The Filan Cabinet

I have a new podcast, where I interview whoever I want about whatever I want. It's called "The Filan Cabinet", and you can find it wherever you listen to podcasts. The first three episodes are about …

00:01:18 | Thu 13 Oct 2022

Disclaimer: The podcast and artwork embedded on this page are the property of Daniel Filan. This content is not affiliated with or endorsed by eachpod.com.

EachPod