Ilyas Moutawwakil's picture

Ilyas Moutawwakil

IlyasMoutawwakil

·

IlyasMoutawwakil

AI & ML interests

Optimization, LLMs, Hardware, Backends, ..

Recent Activity

upvoted an article 4 days ago

Nano-vLLM meets Inference Endpoints

published a dataset 9 days ago

optimum-intel/transformers_daily_ci_intel_gaudi3

updated a dataset about 1 month ago

optimum-benchmark/cpu

View all activity

Organizations

upvoted an article 4 days ago

Article

Nano-vLLM meets Inference Endpoints

By

•

4 days ago

• 5

upvoted a collection about 1 month ago

ColPali v1.3 ONNX

6 items • Updated May 22 • 1

upvoted 2 articles 2 months ago

Article

Introducing HELMET

By

and 6 others •

Apr 16

• 32

Article

Comparing sub 50GB Llama 4 Scout quants (KLD/Top P)

By

•

Apr 9

• 41

upvoted 5 articles 3 months ago

Article

Accelerating LLM Inference with TGI on Intel Gaudi

By

and 4 others •

Mar 28

• 13

Article

Open R1: Update #4

By

and 3 others •

Mar 26

• 48

Article

Universal Assisted Generation: Faster Decoding with Any Assistant Model

By

and 7 others •

Oct 29, 2024

• 56

Article

Introducing Gradio's new Dataframe!

By

and 1 other •

Mar 24

• 28

Article

Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques

By

and 8 others •

Mar 24

• 18

upvoted 3 articles 5 months ago

Article

Welcome to Inference Providers on the Hub 🔥

By

and 6 others •

Jan 28

• 483

Article

Timm ❤️ Transformers: Use any timm model with transformers

By

and 4 others •

Jan 16

• 50

Article

Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference

By

and 1 other •

Jan 16

• 75

upvoted an article 10 months ago

Article

The 5 Most Under-Rated Tools on Hugging Face

By

•

Aug 22, 2024

• 89

upvoted a paper about 1 year ago

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10, 2024 • 71

upvoted an article about 1 year ago

Article

CPU Optimized Embeddings with 🤗 Optimum Intel and fastRAG

By

and 5 others •

Mar 15, 2024

• 10

upvoted a collection about 1 year ago

Fast-RAG Inference Endpoints

An extremely easy to deploy RAG Pipeline using Inference Endpoints • 3 items • Updated Jun 3, 2024 • 1

upvoted an article about 1 year ago

Article

Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

By

and 2 others •

Mar 22, 2024

• 95

upvoted a collection over 1 year ago

Neural Network Compression & Quantization

Tracks papers and links about neural network compression and quantization technics • 4 items • Updated Sep 22, 2023 • 1

upvoted a paper over 1 year ago

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 103

upvoted a paper almost 2 years ago

LeanDojo: Theorem Proving with Retrieval-Augmented Language Models

Paper • 2306.15626 • Published Jun 27, 2023 • 17