1 1

Krishna Teja Chitty-Venkata

krishnateja95

https://krishnateja95.github.io/

AI & ML interests

LLM Optimization, Neural Architecture Search, Quantization, Pruning

Recent Activity

updated a model 3 days ago

nm-testing/Llama-3.1-8B-Instruct-FP8-block

authored a paper 5 days ago

MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models

authored a paper 5 days ago

PagedEviction: Structured Block-wise KV Cache Pruning for Efficient Large Language Model Inference

View all activity

Organizations

updated a model 3 days ago

nm-testing/Llama-3.1-8B-Instruct-FP8-block

Text Generation • Updated 3 days ago • 53

authored 4 papers 5 days ago

MoE-Inference-Bench: Performance Evaluation of Mixture of Expert Large Language and Vision Models

Paper • 2508.17467 • Published Aug 24

PagedEviction: Structured Block-wise KV Cache Pruning for Efficient Large Language Model Inference

Paper • 2509.04377 • Published Sep 4

LExI: Layer-Adaptive Active Experts for Efficient MoE Model Inference

Paper • 2509.02753 • Published Sep 2

ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models

Paper • 2510.01582 • Published 18 days ago • 1

updated a model 6 days ago

nm-testing/Qwen3-8B-FP8-block

Text Generation • Updated 6 days ago

upvoted a paper 6 days ago

ImageNet-Think-250K: A Large-Scale Synthetic Dataset for Multimodal Reasoning for Vision Language Models

Paper • 2510.01582 • Published 18 days ago • 1

updated 4 models 7 days ago

updated a collection 8 days ago

FP8-Block Quantized Models

Collection

Collection of State-of-the-art FP8 Block Quantized Models • 7 items • Updated 7 days ago

published a model 8 days ago

nm-testing/Qwen3-VL-235B-A22B-Instruct-FP8-BLOCK

Updated 8 days ago • 35