Efficient-Large-Model

community

AI & ML interests

None defined yet.

Recent Activity

Ligeng-Zhu published a model about 9 hours ago

Efficient-Large-Model/NVILA-Lite-2B-hf-preview

Ligeng-Zhu published a model about 9 hours ago

Efficient-Large-Model/NVILA-Lite-8B-hf-preview

Ligeng-Zhu updated a collection about 9 hours ago

View all activity

Efficient-Large-Model's activity

Ligeng-Zhu

published 2 models about 9 hours ago

Efficient-Large-Model/NVILA-Lite-2B-hf-preview

Updated about 9 hours ago

Efficient-Large-Model/NVILA-Lite-8B-hf-preview

Text Generation • Updated about 9 hours ago

Ligeng-Zhu

updated a collection about 9 hours ago

NVILA HF

2 items • Updated about 7 hours ago

Ligeng-Zhu

updated 2 models about 9 hours ago

Efficient-Large-Model/NVILA-Lite-8B-hf-preview

Text Generation • Updated about 9 hours ago

Efficient-Large-Model/NVILA-Lite-2B-hf-preview

Updated about 9 hours ago

songhan

authored a paper about 10 hours ago

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Paper • 2502.14866 • Published 1 day ago • 4

zhijianliu

authored a paper about 10 hours ago

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Paper • 2502.14866 • Published 1 day ago • 4

kentang1998

authored a paper about 10 hours ago

LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention

Paper • 2502.14866 • Published 1 day ago • 4

xiuyul

authored a paper about 10 hours ago

S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published 1 day ago • 38

DachengLi

authored a paper about 10 hours ago

S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published 1 day ago • 38

Shangy

authored 8 papers about 15 hours ago

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Paper • 2306.00978 • Published Jun 1, 2023 • 9

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Paper • 2405.04532 • Published May 7, 2024

FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer

Paper • 2301.08739 • Published Jan 20, 2023

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 51

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Paper • 2410.10812 • Published Oct 14, 2024 • 17

Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models

Paper • 2410.10733 • Published Oct 14, 2024 • 3

DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads

Paper • 2410.10819 • Published Oct 14, 2024 • 7

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 58

eva98

authored a paper about 15 hours ago

S*: Test Time Scaling for Code Generation

Paper • 2502.14382 • Published 1 day ago • 38