Link to our arXiv paper: https://arxiv.org/abs/2507.15886
TL, DR;: We study how to efficiently combine multiple monitors with different performance and cost profiles into a single protocol. When the number of monitors is small (2-4), one should simply brute force search over all monitor invocation strategies (i.e., which monitor to call first, when to call an additional monitor). For every strategy, apply the Neyman Pearson lemma and audit regions with the highest likelihood ratios <span>_ell(m) = frac{P(M = m |text{Output is misaligned})}{P(M = m|text{Output is aligned})}_</span> . Measure the recall and pick the best one.
Example of an optimal way of combining monitors given some costs and budgetIn this example, you call your black box monitor only if the probe monitor score is between -0.01 and 2.18, and you audit everywhere with a log likelihood ratio above -0.32 (above the green line). This set of parameters achieves the [...]
---
Outline:
(01:16) Introduction
(02:49) Two example monitors
(04:26) Optimization problem with two monitors
(05:21) Picking where to audit given monitor calls
(08:59) Just brute force where to call monitors
(10:27) Summary:
(11:14) Some more examples
(12:10) Pareto Frontiers with two monitors v. only one monitor
---
First published:
July 27th, 2025
---
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.