Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts Paper • 2506.05229 • Published Jun 5 • 37
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack Paper • 2406.10149 • Published Jun 14, 2024 • 53
Better Together: Enhancing Generative Knowledge Graph Completion with Language Models and Neighborhood Information Paper • 2311.01326 • Published Nov 2, 2023 • 2
Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language Paper • 1905.07213 • Published May 17, 2019
Knowledge Distillation of Russian Language Models with Reduction of Vocabulary Paper • 2205.02340 • Published May 4, 2022
In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss Paper • 2402.10790 • Published Feb 16, 2024 • 43