DistServe

community

GindaChen

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

MathewLuo22 published a dataset about 1 month ago

DistServe/test-amd-ci-profiler

MathewLuo22 updated a dataset about 1 month ago

DistServe/test-sample

MathewLuo22 updated a dataset about 1 month ago

DistServe/2025-05-06T14-automatic-profiling

View all activity

DistServe's activity

MathewLuo22

published a dataset about 1 month ago

DistServe/test-amd-ci-profiler

Updated May 7 • 9

MathewLuo22

updated 2 datasets about 1 month ago

DistServe/test-sample

Viewer • Updated May 6 • 104 • 86

DistServe/2025-05-06T14-automatic-profiling

Updated May 6 • 28

MathewLuo22

published 2 datasets about 1 month ago

DistServe/2025-05-06T14-automatic-profiling

Updated May 6 • 28

DistServe/test-sample

Viewer • Updated May 6 • 104 • 86

zhisbug

authored a paper 4 months ago

Fast Video Generation with Sliding Tile Attention

Paper • 2502.04507 • Published Feb 6 • 52

GindaChen

authored 3 papers 4 months ago

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving

Paper • 2401.09670 • Published Jan 18, 2024 • 2

Mnemosyne: Parallelization Strategies for Efficiently Serving Multi-Million Context Length LLM Inference Requests Without Approximations

Paper • 2409.17264 • Published Sep 25, 2024

Efficiently Serving LLM Reasoning Programs with Certaindex

Paper • 2412.20993 • Published Dec 30, 2024 • 38

zhisbug

authored 11 papers 5 months ago

LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers

Paper • 2310.03294 • Published Oct 5, 2023 • 2

Online Speculative Decoding

Paper • 2310.07177 • Published Oct 11, 2023 • 2

Judging LLM-as-a-judge with MT-Bench and Chatbot Arena

Paper • 2306.05685 • Published Jun 9, 2023 • 36

Evaluating the Robustness of Text-to-image Diffusion Models against Real-world Attacks

Paper • 2306.13103 • Published Jun 16, 2023 • 2

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving

Paper • 2401.09670 • Published Jan 18, 2024 • 2

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Paper • 2402.02057 • Published Feb 3, 2024

Efficient Memory Management for Large Language Model Serving with PagedAttention

Paper • 2309.06180 • Published Sep 12, 2023 • 25

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 68

AI & ML interests

Recent Activity

Team members 8

DistServe's activity