nvidia/Llama-Nemotron-Post-Training-Dataset-v1 Viewer β’ Updated 11 days ago β’ 15.2M β’ 7.64k β’ 262
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference Jan 16 β’ 71
DataGemma Release Collection A series of pioneering open models that help ground LLMs in real-world data through Data Commons. β’ 2 items β’ Updated 2 days ago β’ 85
view post Post 903 Just tried LitServe from the good folks at @LightningAI !Between llama.cpp and vLLM, there is a small gap where a few large models are not deployable!That's where LitServe comes in!LitServe is a high-throughput serving engine for AI models built on FastAPI.Yes, built on FastAPI. That's where the advantage and the issue lie.It's extremely flexible and supports multi-modality and a variety of models out of the box.But in my testing, it lags far behind in speed compared to vLLM.Also, no OpenAI API-compatible endpoint is available as of now.But as we move to multi-modal models and agents, this serves as a good starting point. However, itβs got to become faster...GitHub: https://github.com/Lightning-AI/LitServe 1 reply Β· π 3 3 π 2 2 + Reply