9 11 3

Erik Kaunismäki

erikkaum

https://www.erikkaum.com/

AI & ML interests

None yet

Recent Activity

updated a model about 2 hours ago

erikkaum/vllm-caches

published a model about 2 hours ago

erikkaum/vllm-caches

upvoted an article 7 days ago

LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling

View all activity

Organizations

updated a model about 2 hours ago

erikkaum/vllm-caches

Updated about 2 hours ago

published a model about 2 hours ago

erikkaum/vllm-caches

Updated about 2 hours ago

upvoted an article 7 days ago

Article

LateOn-Code & ColGrep: LightOn unveils state-of-the-art code retrieval models and code search tooling

7 days ago

•

updated a model 9 days ago

erikkaum/vllm-torch-compile-cache

Updated 9 days ago

published a model 9 days ago

erikkaum/vllm-torch-compile-cache

Updated 9 days ago

upvoted an article 3 months ago

Article

Continuous batching from first principles

Nov 25, 2025

•

326

New activity in openai/whisper-large-v3-turbo 3 months ago

WTF is going on?

#71 opened 10 months ago by

vbarrier

New activity in Qwen/Qwen3-Embedding-8B-GGUF 4 months ago

Add feature-extraction as pipline tag

#3 opened 4 months ago by

erikkaum

New activity in Qwen/Qwen3-Embedding-4B-GGUF 4 months ago

Add feature-extraction as pipline tag

#6 opened 4 months ago by

erikkaum

New activity in Qwen/Qwen3-Embedding-0.6B-GGUF 4 months ago

Add feature-extraction as pipline tag

#16 opened 4 months ago by

erikkaum

commented on Test-Driving the LLMD Inference Engine by ZML 🚀 7 months ago

Thank you 🫡

posted an update 7 months ago

Post

2644

ZML just released a technical preview of their new Inference Engine: LLMD.

- Just 2.4GB container, which means fast startup times and efficient autoscaling
- Cross-Platform GPU Support: works on both NVIDIA and AMD GPUs.
- written in Zig

I just tried it out and deployed it on Hugging Face Inference Endpoints and wrote a quick guide 👇 You can try it in like 5 minutes!

https://huggingface.co/blog/erikkaum/test-driving-llmd-inference-engine

1 reply

published an article 7 months ago

Article

Test-Driving the LLMD Inference Engine by ZML 🚀

Jul 18, 2025

•

posted an update 7 months ago

Post

2154

We just released native support for @SGLang and @vllm-project in Inference Endpoints 🔥

Inference Endpoints is becoming the central place where you deploy high performance Inference Engines.

And that provides the managed infra for it. Instead of spending weeks configuring infrastructure, managing servers, and debugging deployment issues, you can focus on what matters most: your AI model and your users 🙌