--- datasets: - tiiuae/falcon-refinedweb language: - en library_name: transformers.js license: mit pipeline_tag: feature-extraction base_model: - chandar-lab/NeoBERT --- # NeoBERT NeoBERT is a **next-generation encoder** model for English text representation, pre-trained from scratch on the RefinedWeb dataset. NeoBERT integrates state-of-the-art advancements in architecture, modern data, and optimized pre-training methodologies. It is designed for seamless adoption: it serves as a plug-and-play replacement for existing base models, relies on an **optimal depth-to-width ratio**, and leverages an extended context length of **4,096 tokens**. Despite its compact 250M parameter footprint, it is the most efficient model of its kind and achieves **state-of-the-art results** on the massive MTEB benchmark, outperforming BERT large, RoBERTa large, NomicBERT, and ModernBERT under identical fine-tuning conditions. - Paper: [paper](https://arxiv.org/abs/2502.19587) - Repository: [github](https://github.com/chandar-lab/NeoBERT). ## Usage ### Transformers.js If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using: ```bash npm i @huggingface/transformers ``` You can then compute embeddings using the pipeline API: ```js import { pipeline } from "@huggingface/transformers"; // Create feature extraction pipeline const extractor = await pipeline("feature-extraction", "onnx-community/NeoBERT-ONNX"); // Compute embeddings const text = "NeoBERT is the most efficient model of its kind!"; const embedding = await extractor(text, { pooling: "cls" }); console.log(embedding.dims); // [1, 768] ``` Or manually with the model and tokenizer classes: ```js import { AutoModel, AutoTokenizer } from "@huggingface/transformers"; // Load model and tokenizer const model_id = "onnx-community/NeoBERT-ONNX"; const tokenizer = await AutoTokenizer.from_pretrained(model_id); const model = await AutoModel.from_pretrained(model_id); // Tokenize input text const text = "NeoBERT is the most efficient model of its kind!"; const inputs = tokenizer(text); // Generate embeddings const outputs = await model(inputs); const embedding = outputs.last_hidden_state.slice(null, 0); console.log(embedding.dims); // [1, 768] ``` ### ONNXRuntime ```py from transformers import AutoTokenizer from huggingface_hub import hf_hub_download import onnxruntime as ort model_id = "onnx-community/NeoBERT-ONNX" tokenizer = AutoTokenizer.from_pretrained(model_id) model_file = hf_hub_download(model_id, filename="onnx/model.onnx") session = ort.InferenceSession(model_file) text = ["NeoBERT is the most efficient model of its kind!"] inputs = tokenizer(text, return_tensors="np").data outputs = session.run(None, inputs)[0] embeddings = outputs[:, 0, :] print(f"{embeddings.shape=}") # (1, 768) ``` ## Conversion The export script can be found at [./export.py](https://huggingface.co/onnx-community/NeoBERT-ONNX/blob/main/export.py).