MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models Paper • 2508.17467 • Published Aug 24
PagedEviction: Structured Block-wise KV Cache Pruning for Efficient Large Language Model Inference Paper • 2509.04377 • Published Sep 4
LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference Paper • 2509.02753 • Published Sep 2
ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models Paper • 2510.01582 • Published 18 days ago • 1
ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models Paper • 2510.01582 • Published 18 days ago • 1
FP8-Block Quantized Models Collection Collection of State-of-the-art FP8 Block Quantized Models • 7 items • Updated 7 days ago
FP8-Block Quantized Models Collection Collection of State-of-the-art FP8 Block Quantized Models • 7 items • Updated 7 days ago