Checkpoints for the main experiments in "Forgetting Transformer: Softmax Attention with a Forget Gate" (https://arxiv.org/abs/2503.02130).
-
zhixuan-lin/fox-pro-760m-longcrawl64-48b
Text Generation • Updated • 26 -
zhixuan-lin/transformer-pro-760m-longcrawl64-48b
Text Generation • Updated • 20 -
zhixuan-lin/fox-llama-760m-longcrawl64-48b
Text Generation • Updated • 22 -
zhixuan-lin/transformer-llama-760m-longcrawl64-48b
Text Generation • Updated • 16