docs: update README
Browse files
README.md
CHANGED
@@ -6,23 +6,22 @@ tags:
|
|
6 |
base_model: nomic-ai/modernbert-embed-base
|
7 |
pipeline_tag: sentence-similarity
|
8 |
library_name: sentence-transformers
|
|
|
9 |
---
|
10 |
|
11 |
-
#
|
12 |
|
13 |
-
This is a [sentence-transformers](https://www.SBERT.net) model
|
14 |
|
15 |
## Model Details
|
16 |
|
17 |
### Model Description
|
18 |
- **Model Type:** Sentence Transformer
|
19 |
- **Base model:** [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) <!-- at revision d556a88e332558790b210f7bdbe87da2fa94a8d8 -->
|
20 |
-
- **Maximum Sequence Length:**
|
21 |
- **Output Dimensionality:** 256 dimensions
|
22 |
- **Similarity Function:** Cosine Similarity
|
23 |
-
|
24 |
-
<!-- - **Language:** Unknown -->
|
25 |
-
<!-- - **License:** Unknown -->
|
26 |
|
27 |
### Model Sources
|
28 |
|
@@ -110,19 +109,17 @@ You can finetune this model on your own dataset.
|
|
110 |
|
111 |
## Training Details
|
112 |
|
|
|
|
|
|
|
|
|
113 |
### Framework Versions
|
114 |
- Python: 3.11.9
|
115 |
- Sentence Transformers: 3.4.1
|
116 |
- Transformers: 4.48.3
|
117 |
- PyTorch: 2.2.2
|
118 |
-
- Accelerate:
|
119 |
-
- Datasets:
|
120 |
- Tokenizers: 0.21.0
|
121 |
|
122 |
-
## Citation
|
123 |
-
|
124 |
-
### BibTeX
|
125 |
-
|
126 |
<!--
|
127 |
## Glossary
|
128 |
|
|
|
6 |
base_model: nomic-ai/modernbert-embed-base
|
7 |
pipeline_tag: sentence-similarity
|
8 |
library_name: sentence-transformers
|
9 |
+
license: mit
|
10 |
---
|
11 |
|
12 |
+
# ModernBERT Embed Base Distilled
|
13 |
|
14 |
+
This is a [sentence-transformers](https://www.SBERT.net) model distilled from [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base). It maps sentences & paragraphs to a 256-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
15 |
|
16 |
## Model Details
|
17 |
|
18 |
### Model Description
|
19 |
- **Model Type:** Sentence Transformer
|
20 |
- **Base model:** [nomic-ai/modernbert-embed-base](https://huggingface.co/nomic-ai/modernbert-embed-base) <!-- at revision d556a88e332558790b210f7bdbe87da2fa94a8d8 -->
|
21 |
+
- **Maximum Sequence Length:** 8 192 tokens
|
22 |
- **Output Dimensionality:** 256 dimensions
|
23 |
- **Similarity Function:** Cosine Similarity
|
24 |
+
|
|
|
|
|
25 |
|
26 |
### Model Sources
|
27 |
|
|
|
109 |
|
110 |
## Training Details
|
111 |
|
112 |
+
### Distillation Process
|
113 |
+
|
114 |
+
The model is distilled using [Model2Vec](https://huggingface.co/blog/Pringled/model2vec) framework. It is a new technique for creating extremely fast and small static embedding models from any Sentence Transformer.
|
115 |
+
|
116 |
### Framework Versions
|
117 |
- Python: 3.11.9
|
118 |
- Sentence Transformers: 3.4.1
|
119 |
- Transformers: 4.48.3
|
120 |
- PyTorch: 2.2.2
|
|
|
|
|
121 |
- Tokenizers: 0.21.0
|
122 |
|
|
|
|
|
|
|
|
|
123 |
<!--
|
124 |
## Glossary
|
125 |
|