YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Nomic Embed Text V1 (ONNX)

Tags: text-embedding onnx nomic-embed-text sentence-transformers


Model Details

  • Model Name: Nomic Embed Text V1 (ONNX export)
  • Original HF Repo: nomic-ai/nomic-embed-text-v1
  • ONNX File: model.onnx
  • Export Date: 2025-05-27

This model outputs:

  1. token_embeddings — per‐token embedding vectors ([batch_size, seq_len, hidden_size])
  2. sentence_embedding — pooled sentence‐level embeddings ([batch_size, hidden_size])

Model Description

Nomic Embed Text V1 is a BERT‐style encoder trained to generate high-quality dense representations of text. It is suitable for:

  • Semantic search
  • Text clustering
  • Recommendation systems
  • Downstream classification

The ONNX export ensures compatibility with inference engines like ONNX Runtime and NVIDIA Triton Inference Server.


Usage

1. Install Dependencies

pip install onnxruntime transformers numpy

2. Install Dependencies

import onnxruntime as ort

session = ort.InferenceSession("model.onnx")

3. Tokenize Inputs

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("nomic-ai/nomic-embed-text-v1")
inputs = tokenizer(
    ["Hello world", "Another sentence"],
    padding=True,
    truncation=True,
    return_tensors="np"
)

4. Run Inference

outputs = session.run(
    ["token_embeddings", "sentence_embedding"],
    {
        "input_ids": inputs["input_ids"],
        "attention_mask": inputs["attention_mask"]
    }
)

token_embeddings, sentence_embeddings = outputs

Serving with Triton

Place your model files under:

models/ └── nomic_embeddings/ └── 1/ ├── model.onnx ├── config.pbtxt └── (tokenizer files…)

Create a config.pbtxt file that looks something like this:

name: "nomic_embeddings"
backend: "onnxruntime"
max_batch_size: 8

input [
  {
    name: "input_ids"
    data_type: TYPE_INT32
    dims: [-1]
  },
  {
    name: "attention_mask"
    data_type: TYPE_INT32
    dims: [-1]
  }
]

output [
  {
    name: "token_embeddings"
    data_type: TYPE_FP32
    dims: [-1, 768]
  },
  {
    name: "sentence_embedding"
    data_type: TYPE_FP32
    dims: [-1, 768]
  }
]

instance_group [
  {
    kind: KIND_GPU
    count: 1
  }
]

Start Triton:

tritonserver \
  --model-repository=/path/to/models \
  --model-control-mode=explicit \
  --load-model=nomic_embeddings
Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support