“From SLT to AIT: NN generalisation out-of-distribution” by Lucius Bushnaq

Author: LessWrong ([email protected])
Published: Thu 04 Sep 2025
Episode Link: https://www.lesswrong.com/posts/2MX2bXreTtntB85Zy/from-slt-to-ait-nn-generalisation-out-of-distribution

Audio note: this article contains 288 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.

TL;DR: This post derives an upper bound on the generalization error for Bayesian learning on neural networks. Unlike the bound from vanilla Singular Learning Theory (SLT), this bound also holds for out-of-distribution generalization, not just for in-distribution generalization. Along the way, it shows some connections between SLT and Algorithmic Information Theory (AIT).

Written at Goodfire AI.

Introduction

Singular Learning Theory (SLT) describes Bayesian learning on neural networks. But it currently has some limitations. One of these limitations is that it assumes model training data are drawn independently and identically (IID) from some distribution, making it difficult to use SLT to describe out-of-distribution (OOD) generalization. If we train a model to classify pictures of animals taken outdoors, vanilla SLT [...]

---

Outline:

(00:52) Introduction

(04:05) Prediction error bounds for a computationally bounded Solomonoff induction

(04:11) Claim 1: We can import Solomonoff induction into the learning-theoretic setting

(06:50) Claim

(08:16) High-level proof summary

(09:34) Claim 2: A bounded induction still efficiently predicts efficiently predictable data}

(09:42) Setup

(10:54) Claim

(13:04) Claim 3: The bounded induction is still somewhat invariant under our choice of UTM

(15:12) Prediction error bound for Bayesian learning on neural networks

(15:17) Claim 4: We can obtain a similar generalization bound for Bayesian learning on neural networks

(15:32) Setup

(16:59) Claim

(18:36) High-level proof summary

(19:14) Comments

(20:21) Relating the volume to SLT quantities

(23:10) Open problems and questions

(23:28) How do the priors actually relate to each other?

(24:08) Conjecture 1

(24:58) Conjecture 2 (Likely false for arbitrary NN architectures)

(27:20) What does _C(w^\*,\\epsilon,f)_ look like in practice?

(28:24) Acknowledgments

The original text contained 16 footnotes which were omitted from this narration.

---

First published:

September 4th, 2025

Source:

https://www.lesswrong.com/posts/2MX2bXreTtntB85Zy/from-slt-to-ait-nn-generalisation-out-of-distribution

---

Narrated by TYPE III AUDIO.

Share to:

EachPod

EachPod

“From SLT to AIT: NN generalisation out-of-distribution” by Lucius Bushnaq