16 16 2

Jintao Zhang

jt-zhang

https://jt-zhang.github.io/

jt-zhang

AI & ML interests

Efficient ML

Recent Activity

authored a paper 4 days ago

Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency

upvoted a paper 7 days ago

Large Scale Diffusion Distillation via Score-Regularized Continuous-Time Consistency

commented on a paper 16 days ago

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

View all activity

Organizations

commented a paper 16 days ago

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

Paper • 2509.24006 • Published 18 days ago • 111 •

commented 2 papers 17 days ago

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

Paper • 2509.24006 • Published 18 days ago • 111 •

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse-Linear Attention

Paper • 2509.24006 • Published 18 days ago • 111 •

New activity in huggingface/HuggingDiscussions about 2 months ago

[FEEDBACK] Daily Papers

🔥 ❤️ 21

144

#32 opened over 1 year ago by

kramp

New activity in jt-zhang/SageAttention3 3 months ago

Any improvement on Ada Lovelace (RTX 4xxx) ?

👀 1

#1 opened 3 months ago by

NielsGx

New activity in jt-zhang/SageAttention2_plus 3 months ago

The performance of sageattention2.2 is worse than sageattention2.1.

#7 opened 4 months ago by

triplemu

New activity in jt-zhang/SageAttention2_plus 4 months ago

5090

#6 opened 4 months ago by

xiaomingxu1995

In the latest commit, we set the default sageattn API to SageAttn2++

🚀 ❤️ 2

#5 opened 4 months ago by

jt-zhang

SageAttention2++ needs CUDA 12.8

🔥 1

#3 opened 4 months ago by

jt-zhang

commented 2 papers 4 months ago

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Paper • 2505.09343 • Published May 14 • 70 •

SageAttention2++: A More Efficient Implementation of SageAttention2

Paper • 2505.21136 • Published May 27 • 46 •

commented 3 papers 5 months ago

SageAttention2++: A More Efficient Implementation of SageAttention2

Paper • 2505.21136 • Published May 27 • 46 •

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Paper • 2505.11594 • Published May 16 • 75 •

SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training

Paper • 2505.11594 • Published May 16 • 75 •

commented 2 papers 7 months ago

SAGE: A Framework of Precise Retrieval for RAG

Paper • 2503.01713 • Published Mar 3 • 7 •

Identifying Sensitive Weights via Post-quantization Integral

Paper • 2503.01901 • Published Feb 28 • 8 •

commented a paper 8 months ago

SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference

Paper • 2502.18137 • Published Feb 25 • 57 •

commented 3 papers 10 months ago

ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing

Paper • 2412.14711 • Published Dec 19, 2024 • 16 •

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

Paper • 2411.10958 • Published Nov 17, 2024 • 56 •

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration

Paper • 2411.10958 • Published Nov 17, 2024 • 56 •

Jintao Zhang

AI & ML interests

Recent Activity

Organizations

jt-zhang's activity

[FEEDBACK] Daily Papers

Any improvement on Ada Lovelace (RTX 4xxx) ?

The performance of sageattention2.2 is worse than sageattention2.1.

5090

In the latest commit, we set the default sageattn API to SageAttn2++

SageAttention2++ needs CUDA 12.8