Exposing Attention Glitches with Flip-Flop Language Modeling Paper • 2306.00946 • Published Jun 1, 2023 • 2
TinyGSM: achieving >80% on GSM8k with small language models Paper • 2312.09241 • Published Dec 14, 2023 • 39
Understanding Augmentation-based Self-Supervised Representation Learning via RKHS Approximation and Regression Paper • 2306.00788 • Published Jun 1, 2023