Toggle Navigation
Each
Pod
Each
Pod
Podcasts
Episodes
Genres
Login
Mechanical Dreams
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers
Author
Mechanical Dirk
Published
Fri 20 Dec 2024
Episode Link
None
Share to: