Collection of Quantized Models for MoE
Krishna Teja Chitty-Venkata
krishnateja95
AI & ML interests
LLM Optimization, Neural Architecture Search, Quantization, Pruning
Recent Activity
updated
a model
3 days ago
nm-testing/Llama-3.1-8B-Instruct-FP8-block
authored
a paper
5 days ago
MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large
Language and Vision Models
authored
a paper
5 days ago
PagedEviction: Structured Block-wise KV Cache Pruning for Efficient
Large Language Model Inference