Need4Speed

company

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

lvwerra authored a paper about 1 month ago

SmolVLM: Redefining small and efficient multimodal models

loubnabnl authored a paper about 1 month ago

SmolVLM: Redefining small and efficient multimodal models

Haihao authored a paper about 2 months ago

Faster Inference of LLMs using FP8 on the Intel Gaudi

View all activity

need-for-speed's activity

wenhuach

posted an update 9 days ago

Post

1872

AutoRound(https://github.com/intel/auto-round) has been integrated into Transformers, allowing you to run AutoRound-formatted models directly in the upcoming release. Additionally, we are actively working on supporting the GGUF double-quant format, e.g. q4_k_s, stay tuned!

https://huggingface.co/blog/autoround

lvwerra

authored a paper about 1 month ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 180

loubnabnl

authored a paper about 1 month ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published Apr 7 • 180

Haihao

authored a paper about 2 months ago

Faster Inference of LLMs using FP8 on the Intel Gaudi

Paper • 2503.09975 • Published Mar 13 • 1

wenhuach

posted an update 2 months ago

Post

2530

Check out [DeepSeek-R1 INT2 model( OPEA/DeepSeek-R1-int2-mixed-sym-inc). This 200GB DeepSeek-R1 model shows only about a 2% drop in MMLU, though it's quite slow due to kernel issue.

| | BF16 | INT2-mixed |
| ------------- | ------ | ---------- |
| mmlu | 0.8514 | 0.8302 |
| hellaswag | 0.6935 | 0.6657 |
| winogrande | 0.7932 | 0.7940 |
| arc_challenge | 0.6212 | 0.6084 |

wenhuach

posted an update 3 months ago

Post

744

OPEA Space has released several quantized DeepSeek models, including INT2. Explore them here
OPEA/deepseek-6784a012d91191015587584a

moshew

authored a paper 3 months ago

SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models

Paper • 2502.09390 • Published Feb 13 • 16

loubnabnl

authored a paper 3 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 229

lvwerra

authored a paper 3 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 229

lvwerra

authored a paper 4 months ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published Jan 14 • 63

wenhuach

posted an update 5 months ago

Post

2342

Are we the only providers of INT4 quantized models for Llama 3.2 VL?
OPEA/Llama-3.2-90B-Vision-Instruct-int4-sym-inc
OPEA/Llama-3.2-11B-Vision-Instruct-int4-sym-inc

3 replies

wenhuach

posted an update 5 months ago

Post

1826

AutoRound has demonstrated strong results even at 2-bit precision for VLM models like QWEN2-VL-72B. Check it out here: OPEA/Qwen2-VL-72B-Instruct-int2-sym-inc.

4 replies

wenhuach

posted an update 5 months ago

Post

345

This week, OPEA Space released several new INT4 models, including:
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
allenai/OLMo-2-1124-13B-Instruct
THUDM/glm-4v-9b
AIDC-AI/Marco-o1
and several others.
Let us know which models you'd like prioritized for quantization, and we'll do our best to make it happen!

OPEA

3 replies

Haihao

authored a paper 5 months ago

A dynamic parallel method for performance optimization on hybrid CPUs

Paper • 2411.19542 • Published Nov 29, 2024 • 5

wenhuach

posted an update 5 months ago

Post

989

OPEA space just releases nearly 20 int4 models, for example, QWQ-32B-Preview,
Llama-3.2-11B-Vision-Instruct, Qwen2.5, Llama3.1, etc. Check out

OPEA

loubnabnl

posted an update 6 months ago

Post

3643

Making SmolLM2 reproducible: open-sourcing our training & evaluation toolkit 🛠️ https://github.com/huggingface/smollm/

- Pre-training code with nanotron
- Evaluation suite with lighteval
- Synthetic data generation using distilabel (powers our new SFT dataset HuggingFaceTB/smoltalk)
- Post-training scripts with TRL & the alignment handbook
- On-device tools with llama.cpp for summarization, rewriting & agents

Apache 2.0 licensed. V2 pre-training data mix coming soon!

Which other tools should we add next?

ofirzaf

authored 2 papers 6 months ago

Q8BERT: Quantized 8Bit BERT

Paper • 1910.06188 • Published Oct 14, 2019 • 2

FastDraft: How to Train Your Draft

Paper • 2411.11055 • Published Nov 17, 2024 • 11

orenpereg

authored 2 papers 6 months ago

Term Set Expansion based on Multi-Context Term Embeddings: an End-to-end Workflow

Paper • 1807.10104 • Published Jul 26, 2018 • 1

ABSApp: A Portable Weakly-Supervised Aspect-Based Sentiment Extraction System

Paper • 1909.05608 • Published Sep 12, 2019

AI & ML interests

Recent Activity

Team members 20

need-for-speed's activity