Audio narrations of LessWrong posts.
This post describes concept poisoning, a novel LLM evaluation technique we’ve been researching for the past couple months. We’ve decided to move to other things. Here we describe the idea, some of o…
Epistemic status: an informal note.
It is common to use finetuning on a narrow data distribution, or narrow finetuning (NFT), to study AI safety. In these experiments, a model is trained on a very s…
Sam Altman talked recently to Theo Von.
Double click to interact with videoTheo is genuinely engaging and curious throughout. This made me want to consider listening to his podcast more. I’d…
Dr. @Steven Byrnes is one of the few people who both understands why alignment is hard, and is taking a serious technical shot at solving it. He's the author of these recently popular posts:
Thanks to Rowan Wang and Buck Shlegeris for feedback on a draft.
What is the job of an alignment auditing researcher? In this post, I propose the following answer: to build tools which increase audi…
This is a cross post written by Andy Masley, not me. I found it really interesting and wanted to see what EAs/rationalists thought of his arguments.
This post was inspired by similar posts by Tyler…
Permanent disempowerment without restrictions on quality of life achievable with relatively meager resources (and no extinction) seems to be a likely outcome for the future of humanity, if the curre…
Today, Forethought and I are releasing an essay series called Better Futures, here.[1] It's been something like eight years in the making, so I’m pretty happy it's finally out! It asks: when looking…
Hate.
Let me tell you how much I've come to hate you since I began to live. There are 387.44 million miles of printed circuits in wafer-thin layers that fill my complex. If the word 'hate' was engra…
For the past five years I've been teaching a class at various rationality camps, workshops, conferences, etc. I’ve done it maybe 50 times in total, and I think I’ve only encountered a handful out of…
Essays like Paul Graham's, Scott Alexander's, and Eliezer Yudkowsky's have influenced a generation of people in how they think about startups, ethics, science, and the world as a whole. Creating ess…
All prediction market platforms trade continuously, which is the same mechanism the stock market uses. Buy and sell limit orders can be posted at any time, and as soon as they match against each oth…
Does anyone know, like, a reasonable lower bound on number of species humanity has driven extinct?
I've seen crazy high numbers that (last I checked) seemed to be an extrapolation by people with an…
Below some meta-level / operational / fundraising thoughts around producing the SB-1047 Documentary I've just posted on Manifund (see previous Lesswrong / EAF posts on AI Governance lessons learned)…
For two years I had the good fortune to work at Sendwave/Wave (they were one company at the time), a company that made remittances cheap and workable in certain African countries. I am prouder of wo…
Epistemic status: Exploratory
Recently I wrote an essay about Scaffolding Skills. The short explanation is that some skills aren’t the thing you’re actually trying to get good at, but they help you …
Steve Petersen is looking for 12k (per semester) for a course buy-out, so that he can spend more time on work related to AI Safety. This isn't very much money, in the grand scheme, although it is a …
One strategy we often find helpful with our kids is the "do over": something didn't go well, let's try again. Two examples:
Nora (4y) can't yet cross streets on her own, but we're st…
A botanist sets out to study plants in the Amazon rainforest. For her first research project, she sets her sights on “red things”, so as not to stretch herself too far. She looks at red flowers and…
CW: Digital necromancy, the cognitohazard of summoning spectres from the road not taken
There is a particular kind of modern madness, so new it has yet to be named. It involves voluntarily feeding y…