2 8 2

Ofir Zafrir

ofirzaf

AI & ML interests

Sparsity, Qunatization, Model Compression

Recent Activity

upvoted an article about 2 months ago

Introducing HELMET

upvoted an article 2 months ago

Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques

upvoted a paper 4 months ago

SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models

View all activity

Organizations

ofirzaf's activity

upvoted an article about 2 months ago

Article

Introducing HELMET

and 6 others •

Apr 16

• 30

upvoted an article 2 months ago

Article

Speeding Up LLM Decoding with Advanced Universal Assisted Generation Techniques

and 8 others •

Mar 24

• 18

upvoted a paper 4 months ago

SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models

Paper • 2502.09390 • Published Feb 13 • 16

upvoted a collection 4 months ago

Speculative Decoding Draft Models

Collection

Collection of OpenVINO optimized efficient draft models for speculative decoding • 3 items • Updated 30 days ago • 9

liked a model 4 months ago

OpenVINO/Llama-3.1-8B-Instruct-FastDraft-150M-int8-ov

Updated Dec 16, 2024 • 2.36k • 6

upvoted an article 4 months ago

Article

A Chatbot on your Laptop: Phi-2 on Intel Meteor Lake

and 5 others •

Mar 20, 2024

• 6

authored 2 papers 7 months ago

Q8BERT: Quantized 8Bit BERT

Paper • 1910.06188 • Published Oct 14, 2019 • 2

FastDraft: How to Train Your Draft

Paper • 2411.11055 • Published Nov 17, 2024 • 11

upvoted a paper 7 months ago

FastDraft: How to Train Your Draft

Paper • 2411.11055 • Published Nov 17, 2024 • 11

upvoted a paper 10 months ago

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

Paper • 2408.02545 • Published Aug 5, 2024 • 38

published an article about 1 year ago

Article

A Chatbot on your Laptop: Phi-2 on Intel Meteor Lake

and 5 others •

Mar 20, 2024

• 6

published an article over 1 year ago

Article

Accelerate StarCoder with 🤗 Optimum Intel on Xeon: Q8/Q4 and Speculative Decoding

and 10 others •

Jan 30, 2024

• 9

authored a paper almost 2 years ago

An Efficient Sparse Inference Software Accelerator for Transformer-based Language Models on CPUs

Paper • 2306.16601 • Published Jun 28, 2023 • 4

liked a Space about 2 years ago

13.1k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots

updated a model over 2 years ago

Intel/distilbert-base-uncased-squadv1.1-sparse-80-1x4-block-pruneofa

Question Answering • Updated Sep 20, 2022 • 17

updated 4 models almost 3 years ago

updated a model about 3 years ago

ofirzaf/bert-large-uncased-mnli

Text Classification • Updated May 9, 2022 • 11