Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
julien-c 's Collections
Canonical models
Papers about model merging
Recent Mamba Papers
git-theta
Research projects on top of vLLM

Research projects on top of vLLM

updated Jul 29, 2024

Papers cited in https://blog.vllm.ai/2024/07/25/lfai-perf.html

Upvote
12

  • Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

    Paper • 2407.00079 • Published Jun 24, 2024 • 5

  • Llumnix: Dynamic Scheduling for Large Language Model Serving

    Paper • 2406.03243 • Published Jun 5, 2024

  • CacheGen: Fast Context Loading for Language Model Applications

    Paper • 2310.07240 • Published Oct 11, 2023 • 1

  • vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention

    Paper • 2405.04437 • Published May 7, 2024 • 3

  • Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services

    Paper • 2404.16283 • Published Apr 25, 2024

  • Efficiently Programming Large Language Models using SGLang

    Paper • 2312.07104 • Published Dec 12, 2023 • 7
Upvote
12
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs