YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Nomic Embed Text V1 (ONNX)
Tags: text-embedding
onnx
nomic-embed-text
sentence-transformers
Model Details
- Model Name: Nomic Embed Text V1 (ONNX export)
- Original HF Repo: nomic-ai/nomic-embed-text-v1
- ONNX File:
model.onnx
- Export Date: 2025-05-27
This model outputs:
- token_embeddings — per‐token embedding vectors (
[batch_size, seq_len, hidden_size]
) - sentence_embedding — pooled sentence‐level embeddings (
[batch_size, hidden_size]
)
Model Description
Nomic Embed Text V1 is a BERT‐style encoder trained to generate high-quality dense representations of text. It is suitable for:
- Semantic search
- Text clustering
- Recommendation systems
- Downstream classification
The ONNX export ensures compatibility with inference engines like ONNX Runtime and NVIDIA Triton Inference Server.
Usage
1. Install Dependencies
pip install onnxruntime transformers numpy
2. Install Dependencies
import onnxruntime as ort
session = ort.InferenceSession("model.onnx")
3. Tokenize Inputs
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("nomic-ai/nomic-embed-text-v1")
inputs = tokenizer(
["Hello world", "Another sentence"],
padding=True,
truncation=True,
return_tensors="np"
)
4. Run Inference
outputs = session.run(
["token_embeddings", "sentence_embedding"],
{
"input_ids": inputs["input_ids"],
"attention_mask": inputs["attention_mask"]
}
)
token_embeddings, sentence_embeddings = outputs
Serving with Triton
Place your model files under:
models/ └── nomic_embeddings/ └── 1/ ├── model.onnx ├── config.pbtxt └── (tokenizer files…)
Create a config.pbtxt file that looks something like this:
name: "nomic_embeddings"
backend: "onnxruntime"
max_batch_size: 8
input [
{
name: "input_ids"
data_type: TYPE_INT32
dims: [-1]
},
{
name: "attention_mask"
data_type: TYPE_INT32
dims: [-1]
}
]
output [
{
name: "token_embeddings"
data_type: TYPE_FP32
dims: [-1, 768]
},
{
name: "sentence_embedding"
data_type: TYPE_FP32
dims: [-1, 768]
}
]
instance_group [
{
kind: KIND_GPU
count: 1
}
]
Start Triton:
tritonserver \
--model-repository=/path/to/models \
--model-control-mode=explicit \
--load-model=nomic_embeddings
- Downloads last month
- 20
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support