Update README.md
Browse files
README.md
CHANGED
|
@@ -10,15 +10,19 @@ tags:
|
|
| 10 |
---
|
| 11 |
|
| 12 |
# hku-nlp/instructor-xl
|
| 13 |
-
This is a general embedding model: It maps
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
|
|
|
|
|
|
| 17 |
git clone https://github.com/HKUNLP/instructor-embedding
|
| 18 |
cd sentence-transformers
|
| 19 |
pip install -e .
|
| 20 |
```
|
| 21 |
-
|
|
|
|
|
|
|
| 22 |
```python
|
| 23 |
from sentence_transformers import SentenceTransformer
|
| 24 |
sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments"
|
|
@@ -26,4 +30,18 @@ instruction = "Represent the Science title; Input:"
|
|
| 26 |
model = SentenceTransformer('hku-nlp/instructor-xl')
|
| 27 |
embeddings = model.encode([[instruction,sentence,0]])
|
| 28 |
print(embeddings)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
```
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
# hku-nlp/instructor-xl
|
| 13 |
+
This is a general embedding model: It maps **any** piece of text (e.g., a title, a sentence, a document, etc.) to a fixed-length vector in test time **without further training**. With instructions, the embeddings are **domain-specific** (e.g., specialized for science, finance, etc.) and **task-aware** (e.g., customized for classification, information retrieval, etc.)
|
| 14 |
+
|
| 15 |
+
The model is easy to use with `sentence-transformer` library.
|
| 16 |
+
|
| 17 |
+
## Installation
|
| 18 |
+
```bash
|
| 19 |
git clone https://github.com/HKUNLP/instructor-embedding
|
| 20 |
cd sentence-transformers
|
| 21 |
pip install -e .
|
| 22 |
```
|
| 23 |
+
|
| 24 |
+
## Compute your customized embeddings
|
| 25 |
+
Then you can use the model like this to calculate domain-specific and task-aware embeddings:
|
| 26 |
```python
|
| 27 |
from sentence_transformers import SentenceTransformer
|
| 28 |
sentence = "3D ActionSLAM: wearable person tracking in multi-floor environments"
|
|
|
|
| 30 |
model = SentenceTransformer('hku-nlp/instructor-xl')
|
| 31 |
embeddings = model.encode([[instruction,sentence,0]])
|
| 32 |
print(embeddings)
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
## Calculate Sentence similarities
|
| 36 |
+
You can further use the model to compute similarities between two groups of sentences, with **customized embeddings**.
|
| 37 |
+
```python
|
| 38 |
+
from sklearn.metrics.pairwise import cosine_similarity
|
| 39 |
+
sentences_a = [['Represent the Science sentence; Input: ','Parton energy loss in QCD matter',0],
|
| 40 |
+
['Represent the Financial statement; Input: ','The Federal Reserve on Wednesday raised its benchmark interest rate.',0]
|
| 41 |
+
sentences_b = [['Represent the Science sentence; Input: ','The Chiral Phase Transition in Dissipative Dynamics', 0],
|
| 42 |
+
['Represent the Financial statement; Input: ','The funds rose less than 0.5 per cent on Friday',0]
|
| 43 |
+
embeddings_a = model.encode(sentences_a)
|
| 44 |
+
embeddings_b = model.encode(sentences_b)
|
| 45 |
+
similarities = cosine_similarity(embeddings_a,embeddings_b)
|
| 46 |
+
print(similarities)
|
| 47 |
```
|