view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference By mfuntowicz and 1 other • Jan 16 • 71
view article Article Hugging Face on AMD Instinct MI300 GPU By mfuntowicz and 3 others • May 21, 2024 • 13
view article Article CPU Optimized Embeddings with 🤗 Optimum Intel and fastRAG By peterizsak and 5 others • Mar 15, 2024 • 9
view article Article Accelerating SD Turbo and SDXL Turbo Inference with ONNX Runtime and Olive By sschoenmeyer and 2 others • Jan 15, 2024 • 4
view article Article AMD + 🤗: Large Language Models Out-of-the-Box Acceleration with AMD GPU Dec 5, 2023 • 2
view article Article Optimum-NVIDIA - Unlock blazingly fast LLM inference in just 1 line of code By laikh-nvidia and 1 other • Dec 5, 2023 • 5
view article Article Accelerating over 130,000 Hugging Face models with ONNX Runtime By sschoenmeyer and 1 other • Oct 4, 2023 • 1
view article Article Case Study: Millisecond Latency using Hugging Face Infinity and modern CPUs By philschmid and 2 others • Jan 13, 2022 • 2
view article Article Scaling up BERT-like model Inference on modern CPU - Part 2 By mfuntowicz and 3 others • Nov 4, 2021 • 1
view article Article Introducing Optimum: The Optimization Toolkit for Transformers at Scale By mfuntowicz and 3 others • Sep 14, 2021 • 1