#grokking23/04/2025
Microsoft's Muon Optimizer Dramatically Speeds Up Grokking in Transformers
Microsoft researchers demonstrate that the Muon optimizer drastically speeds up grokking in Transformer models, enabling faster transition from memorization to generalization compared to AdamW.