LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs Paper • 2506.14429 • Published 8 days ago • 43
Beyond Homogeneous Attention: Memory-Efficient LLMs via Fourier-Approximated KV Cache Paper • 2506.11886 • Published 12 days ago • 20
Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training Paper • 2506.10952 • Published 12 days ago • 23