#FoX25/04/2025
Introducing Forgetting Transformer (FoX): Revolutionizing Long-Context Language Modeling with Efficient Memory Control
Mila & Universite de Montreal researchers introduce FoX, a novel Transformer variant with learnable forget gates that improve long-context language modeling efficiency and accuracy without computational trade-offs.