view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL By toslali-ibm and 5 others • 7 days ago • 40
view article Article Bamba: Inference-Efficient Hybrid Mamba2 Model By rganti and 28 others • Dec 18, 2024 • 55