|
--- |
|
datasets: |
|
- tiiuae/falcon-refinedweb |
|
language: |
|
- en |
|
library_name: transformers.js |
|
license: mit |
|
pipeline_tag: feature-extraction |
|
base_model: |
|
- chandar-lab/NeoBERT |
|
--- |
|
|
|
# NeoBERT |
|
|
|
NeoBERT is a **next-generation encoder** model for English text representation, pre-trained from scratch on the RefinedWeb dataset. NeoBERT integrates state-of-the-art advancements in architecture, modern data, and optimized pre-training methodologies. It is designed for seamless adoption: it serves as a plug-and-play replacement for existing base models, relies on an **optimal depth-to-width ratio**, and leverages an extended context length of **4,096 tokens**. Despite its compact 250M parameter footprint, it is the most efficient model of its kind and achieves **state-of-the-art results** on the massive MTEB benchmark, outperforming BERT large, RoBERTa large, NomicBERT, and ModernBERT under identical fine-tuning conditions. |
|
|
|
- Paper: [paper](https://arxiv.org/abs/2502.19587) |
|
- Repository: [github](https://github.com/chandar-lab/NeoBERT). |
|
|
|
## Usage |
|
|
|
### ONNXRuntime |
|
|
|
```py |
|
from transformers import AutoTokenizer |
|
from huggingface_hub import hf_hub_download |
|
import onnxruntime as ort |
|
|
|
model_id = "onnx-community/NeoBERT-ONNX" |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
model_file = hf_hub_download(model_id, filename="onnx/model.onnx") |
|
session = ort.InferenceSession(model_file) |
|
|
|
text = ["NeoBERT is the most efficient model of its kind!"] |
|
inputs = tokenizer(text, return_tensors="np").data |
|
outputs = session.run(None, inputs)[0] |
|
embeddings = outputs[:, 0, :] |
|
print(f"{embeddings.shape=}") # (1, 768) |
|
``` |
|
|
|
## Conversion |
|
|
|
The export script can be found at [./export.py](https://huggingface.co/onnx-community/NeoBERT-ONNX/blob/main/export.py). |