view article Article The New and Fresh analytics in Inference Endpoints By erikkaum and 4 others • Mar 21 • 19
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 • 405
Running 2.55k 2.55k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
view article Article From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub Feb 12 • 64
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference Jan 16 • 73
view article Article Train 400x faster Static Embedding Models with Sentence Transformers Jan 15 • 177
view post Post 1782 A while ago I started experimenting with compiling the Python interpreter to WASM.To build a secure, fast, and lightweight sandbox for code execution — ideal for running LLM-generated Python code.- Send code simply as a POST request- 1-2ms startup timesHack away:https://github.com/ErikKaum/runner 🔥 8 8 👀 6 6 + Reply
Running 63 63 Scaling FineWeb to 1000+ languages: Step 1: finding signal in 100s of evaluation tasks 📝 Evaluate multilingual models using FineTasks
view article Article Releasing Outlines-core 0.1.0: structured generation in Rust and Python By erikkaum and 6 others • Oct 22, 2024 • 44
view article Article Releasing Outlines-core 0.1.0: structured generation in Rust and Python By erikkaum and 6 others • Oct 22, 2024 • 44
view post Post 1108 This week in Inference Endpoints - thx @erikkaum for the update!👀 https://huggingface.co/blog/erikkaum/endpoints-changelog 1 reply · 🚀 1 1 👍 1 1 🔥 1 1 ❤️ 1 1 + Reply