File size: 3,062 Bytes
8c2bf67
 
 
 
 
 
 
 
cd28d22
 
8c2bf67
 
 
 
 
 
 
 
9e3e5cd
921f626
 
051788e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
921f626
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9e3e5cd
 
cd28d22
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
datasets:
- tiiuae/falcon-refinedweb
language:
- en
library_name: transformers.js
license: mit
pipeline_tag: feature-extraction
base_model:
- chandar-lab/NeoBERT
---

# NeoBERT

NeoBERT is a **next-generation encoder** model for English text representation, pre-trained from scratch on the RefinedWeb dataset. NeoBERT integrates state-of-the-art advancements in architecture, modern data, and optimized pre-training methodologies. It is designed for seamless adoption: it serves as a plug-and-play replacement for existing base models, relies on an **optimal depth-to-width ratio**, and leverages an extended context length of **4,096 tokens**. Despite its compact 250M parameter footprint, it is the most efficient model of its kind and achieves **state-of-the-art results** on the massive MTEB benchmark, outperforming BERT large, RoBERTa large, NomicBERT, and ModernBERT under identical fine-tuning conditions. 

- Paper: [paper](https://arxiv.org/abs/2502.19587)
- Repository: [github](https://github.com/chandar-lab/NeoBERT).

## Usage

### Transformers.js

If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
```bash
npm i @huggingface/transformers
```

You can then compute embeddings using the pipeline API:

```js
import { pipeline } from "@huggingface/transformers";

// Create feature extraction pipeline
const extractor = await pipeline("feature-extraction", "onnx-community/NeoBERT-ONNX");

// Compute embeddings
const text = "NeoBERT is the most efficient model of its kind!";
const embedding = await extractor(text, { pooling: "cls" });
console.log(embedding.dims); // [1, 768]
```

Or manually with the model and tokenizer classes:
```js
import { AutoModel, AutoTokenizer } from "@huggingface/transformers";

// Load model and tokenizer
const model_id = "onnx-community/NeoBERT-ONNX";
const tokenizer = await AutoTokenizer.from_pretrained(model_id);
const model = await AutoModel.from_pretrained(model_id);

// Tokenize input text
const text = "NeoBERT is the most efficient model of its kind!";
const inputs = tokenizer(text);

// Generate embeddings
const outputs = await model(inputs);
const embedding = outputs.last_hidden_state.slice(null, 0);
console.log(embedding.dims); // [1, 768]
```

### ONNXRuntime

```py
from transformers import AutoTokenizer
from huggingface_hub import hf_hub_download
import onnxruntime as ort

model_id = "onnx-community/NeoBERT-ONNX"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model_file = hf_hub_download(model_id, filename="onnx/model.onnx")
session = ort.InferenceSession(model_file)

text = ["NeoBERT is the most efficient model of its kind!"]
inputs = tokenizer(text, return_tensors="np").data
outputs = session.run(None, inputs)[0]
embeddings = outputs[:, 0, :]
print(f"{embeddings.shape=}") # (1, 768)
```

## Conversion

The export script can be found at [./export.py](https://huggingface.co/onnx-community/NeoBERT-ONNX/blob/main/export.py).