1. EachPod
EachPod

Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers

Author
Mechanical Dirk
Published
Fri 20 Dec 2024
Episode Link
None

Share to: