Update README.md
Browse files
README.md
CHANGED
@@ -82,19 +82,17 @@ datasets:
|
|
82 |
- SalmanFaroz/DisEmbed-Symptom-Disease-v1
|
83 |
---
|
84 |
|
85 |
-
#
|
|
|
86 |
|
87 |
-
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
88 |
|
89 |
## Model Details
|
90 |
|
91 |
### Model Description
|
92 |
-
- **
|
93 |
-
- **Base model:** [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) <!-- at revision 5c38ec7c405ec4b44b94cc5a9bb96e735b38267a -->
|
94 |
- **Maximum Sequence Length:** 512 tokens
|
95 |
- **Output Dimensionality:** 384 dimensions
|
96 |
- **Similarity Function:** Cosine Similarity
|
97 |
-
<!-- - **Training Dataset:** Unknown -->
|
98 |
- **Language:** en
|
99 |
- **License:** mit
|
100 |
|
@@ -127,30 +125,4 @@ print(embeddings.shape)
|
|
127 |
similarities = model.similarity(embeddings, embeddings)
|
128 |
print(similarities.shape)
|
129 |
# [3, 3]
|
130 |
-
```
|
131 |
-
|
132 |
-
|
133 |
-
#### Sentence Transformers
|
134 |
-
```bibtex
|
135 |
-
@inproceedings{reimers-2019-sentence-bert,
|
136 |
-
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
|
137 |
-
author = "Reimers, Nils and Gurevych, Iryna",
|
138 |
-
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
|
139 |
-
month = "11",
|
140 |
-
year = "2019",
|
141 |
-
publisher = "Association for Computational Linguistics",
|
142 |
-
url = "https://arxiv.org/abs/1908.10084",
|
143 |
-
}
|
144 |
-
```
|
145 |
-
|
146 |
-
#### MultipleNegativesRankingLoss
|
147 |
-
```bibtex
|
148 |
-
@misc{henderson2017efficient,
|
149 |
-
title={Efficient Natural Language Response Suggestion for Smart Reply},
|
150 |
-
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
|
151 |
-
year={2017},
|
152 |
-
eprint={1705.00652},
|
153 |
-
archivePrefix={arXiv},
|
154 |
-
primaryClass={cs.CL}
|
155 |
-
}
|
156 |
```
|
|
|
82 |
- SalmanFaroz/DisEmbed-Symptom-Disease-v1
|
83 |
---
|
84 |
|
85 |
+
# DisEmbed-v1
|
86 |
+
DisEmbed-v1 is a disease-focused embedding model designed for the medical domain, trained on a synthetic dataset comprising disease descriptions, symptoms, and Q&A pairs. It outperforms general medical models in disease-specific tasks, particularly in distinguishing similar diseases. DisEmbed excels in retrieval task and disease-context identification.
|
87 |
|
|
|
88 |
|
89 |
## Model Details
|
90 |
|
91 |
### Model Description
|
92 |
+
- **Dataset : [DisEmbed-Symptom-Disease-v1](https://huggingface.co/datasets/SalmanFaroz/DisEmbed-Symptom-Disease-v1)**
|
|
|
93 |
- **Maximum Sequence Length:** 512 tokens
|
94 |
- **Output Dimensionality:** 384 dimensions
|
95 |
- **Similarity Function:** Cosine Similarity
|
|
|
96 |
- **Language:** en
|
97 |
- **License:** mit
|
98 |
|
|
|
125 |
similarities = model.similarity(embeddings, embeddings)
|
126 |
print(similarities.shape)
|
127 |
# [3, 3]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
128 |
```
|