12 24 17

Huiqiang Jiang

iofu728

https://www.microsoft.com/en-us/research/people/hjiang/

AI & ML interests

None yet

Recent Activity

authored a paper 18 days ago

Chain-of-Model Learning for Language Model

upvoted a paper 18 days ago

Chain-of-Model Learning for Language Model

authored a paper 30 days ago

RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

View all activity

Organizations

iofu728's activity

upvoted a paper 18 days ago

Chain-of-Model Learning for Language Model

Paper • 2505.11820 • Published 21 days ago • 116

upvoted 2 papers about 1 month ago

RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference

Paper • 2505.02922 • Published May 5 • 27

MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention

Paper • 2504.16083 • Published Apr 22 • 9

upvoted 2 papers 4 months ago

Optimizing Large Language Model Training Using FP4 Quantization

Paper • 2501.17116 • Published Jan 28 • 38

Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models

Paper • 2501.13629 • Published Jan 23 • 48

upvoted a paper 5 months ago

rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking

Paper • 2501.04519 • Published Jan 8 • 280

upvoted 3 papers 6 months ago

upvoted a paper 8 months ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 179

upvoted an article 9 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

and 5 others •

Sep 18, 2024

• 246

upvoted a paper 9 months ago

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Paper • 2409.10516 • Published Sep 16, 2024 • 44

upvoted a paper 10 months ago

MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

Paper • 2408.11049 • Published Aug 20, 2024 • 13

upvoted an article 10 months ago

Article

A failed experiment: Infini-Attention, and why we should keep trying?

and 2 others •

Aug 14, 2024

• 64

upvoted 2 articles 11 months ago

Article

RegMix: Data Mixture as Regression for Language Model Pre-training

•

Jul 11, 2024

• 12

Article

MInference 1.0: 10x Faster Million Context Inference with a Single GPU

•

Jul 11, 2024

• 13

upvoted a paper 11 months ago

MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Paper • 2407.02490 • Published Jul 2, 2024 • 26

upvoted a paper about 1 year ago

LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression

Paper • 2403.12968 • Published Mar 19, 2024 • 26

upvoted 2 papers over 1 year ago

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 618

Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning

Paper • 2402.06619 • Published Feb 9, 2024 • 57