adrien-riaux
/

distill-modernbert-embed-base

Sentence Similarity

sentence-transformers

feature-extraction

Model card Files Files and versions Community

adrien-riaux commited on Feb 14

Commit

514a4d2

·

verified ·

1 Parent(s): 8970a21

docs: update README

Files changed (1) hide show

README.md +9 -12

README.md CHANGED Viewed

@@ -6,23 +6,22 @@ tags:
 base_model: nomic-ai/modernbert-embed-base
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
 ---
-# SentenceTransformer based on nomic-ai/modernbert-embed-base
-This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base). It maps sentences & paragraphs to a 256-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
 - **Base model:** [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) <!-- at revision d556a88e332558790b210f7bdbe87da2fa94a8d8 -->
-- **Maximum Sequence Length:** inf tokens
 - **Output Dimensionality:** 256 dimensions
 - **Similarity Function:** Cosine Similarity
-<!-- - **Training Dataset:** Unknown -->
-<!-- - **Language:** Unknown -->
-<!-- - **License:** Unknown -->
 ### Model Sources
@@ -110,19 +109,17 @@ You can finetune this model on your own dataset.
 ## Training Details
 ### Framework Versions
 - Python: 3.11.9
 - Sentence Transformers: 3.4.1
 - Transformers: 4.48.3
 - PyTorch: 2.2.2
-- Accelerate:
-- Datasets:
 - Tokenizers: 0.21.0
-## Citation
-### BibTeX
 <!--
 ## Glossary

 base_model: nomic-ai/modernbert-embed-base
 pipeline_tag: sentence-similarity
 library_name: sentence-transformers
+license: mit
 ---
+# ModernBERT Embed Base Distilled
+This is a [sentence-transformers](https://www.SBERT.net) model distilled from [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base). It maps sentences & paragraphs to a 256-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
 ## Model Details
 ### Model Description
 - **Model Type:** Sentence Transformer
 - **Base model:** [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) <!-- at revision d556a88e332558790b210f7bdbe87da2fa94a8d8 -->
+- **Maximum Sequence Length:** 8 192 tokens
 - **Output Dimensionality:** 256 dimensions
 - **Similarity Function:** Cosine Similarity
 ### Model Sources
 ## Training Details
+### Distillation Process
+The model is distilled using [Model2Vec](https://huggingface.co/blog/Pringled/model2vec) framework. It is a new technique for creating extremely fast and small static embedding models from any Sentence Transformer.
 ### Framework Versions
 - Python: 3.11.9
 - Sentence Transformers: 3.4.1
 - Transformers: 4.48.3
 - PyTorch: 2.2.2
 - Tokenizers: 0.21.0
 <!--
 ## Glossary