VQ-Logits: Compressing the Output Bottleneck of Large Language Models via Vector Quantized Logits Paper • 2505.10202 • Published May 15
Power-Law Decay Loss for Large Language Model Finetuning: A Theory Perspective Paper • 2505.16900 • Published May 22
ComplexFormer: Disruptively Advancing Transformer Inference Ability via Head-Specific Complex Vector Attention Paper • 2505.10222 • Published May 15
Towards Analyzing and Understanding the Limitations of VAPO: A Theoretical Perspective Paper • 2505.17997 • Published May 23