ReCode: Updating Code API Knowledge with Reinforcement Learning Paper • 2506.20495 • Published Jun 25 • 8
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published Mar 6 • 20 • 8
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published Mar 6 • 20
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization Paper • 2503.04598 • Published Mar 6 • 20 • 8
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More Paper • 2502.07490 • Published Feb 11 • 9
view article Article How to generate text: using different decoding methods for language generation with Transformers By patrickvonplaten • Mar 1, 2020 • 243
view article Article You could have designed state of the art positional encoding By FL33TW00D-HF • Nov 25, 2024 • 355
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models Paper • 2501.13629 • Published Jan 23 • 49
Benchmarking Chinese Knowledge Rectification in Large Language Models Paper • 2409.05806 • Published Sep 9, 2024 • 15
OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs Paper • 2409.05152 • Published Sep 8, 2024 • 33