Zimix commited on
Commit
a807782
·
1 Parent(s): 521e48c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -12
README.md CHANGED
@@ -14,7 +14,7 @@ license: apache-2.0
14
 
15
  ## 简介 Brief Introduction
16
 
17
- 基于simcse无监督版本,用搜集整理的中文nli数据进行simcse有监督任务的训练。在中文句子对任务上有良好的效果。
18
 
19
  **Erlangshen-SimCSE-110M-Chinese** is based on the unsupervised version of simcse, And training simcse supervised task with collected and sorted chinese NLI data for. It has good effect on the task in Chinese sentences pair.
20
 
@@ -22,7 +22,7 @@ license: apache-2.0
22
 
23
  | 需求 Demand | 任务 Task | 系列 Series | 模型 Model | 参数 Parameter | 额外 Extra |
24
  | :----: | :----: | :----: | :----: | :----: | :----: |
25
- | 通用 General | 自然语言生成 NLU | 闻仲 Erlangshen | Bert | 110M | 中文 Chinese |
26
 
27
  ## 模型信息 Model Information
28
 
@@ -45,21 +45,30 @@ In order to obtain a general sentence-embedding-model, we use a large number of
45
  ### 加载模型 Loading Models
46
 
47
  ```python
48
- from transformers import GPT2Tokenizer, GPT2Model
49
- tokenizer = GPT2Tokenizer.from_pretrained('IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese')
50
- model = GPT2Model.from_pretrained('IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese')
51
- text = "Replace me by any text you'd like."
52
- encoded_input = tokenizer(text, return_tensors='pt')
53
- output = model(**encoded_input)
54
  ```
55
 
56
  ### 使用示例 Usage Examples
57
 
58
  ```python
59
- from transformers import pipeline, set_seed
60
- set_seed(55)
61
- generator = pipeline('text-generation', model='IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese')
62
- generator("北京位于", max_length=30, num_return_sequences=1)
 
 
 
 
 
 
 
 
 
 
 
 
63
  ```
64
 
65
  ## 引用 Citation
 
14
 
15
  ## 简介 Brief Introduction
16
 
17
+ 基于simcse无监督版本,用搜集整理的中文NLI数据进行simcse有监督任务的训练。在中文句子对任务上有良好的效果。
18
 
19
  **Erlangshen-SimCSE-110M-Chinese** is based on the unsupervised version of simcse, And training simcse supervised task with collected and sorted chinese NLI data for. It has good effect on the task in Chinese sentences pair.
20
 
 
22
 
23
  | 需求 Demand | 任务 Task | 系列 Series | 模型 Model | 参数 Parameter | 额外 Extra |
24
  | :----: | :----: | :----: | :----: | :----: | :----: |
25
+ | 通用 General | 自然语言生成 NLU | 二郎神 Erlangshen | Bert | 110M | 中文 Chinese |
26
 
27
  ## 模型信息 Model Information
28
 
 
45
  ### 加载模型 Loading Models
46
 
47
  ```python
48
+ from transformers import AutoTokenizer,AutoModelForMaskedLM
49
+ model =AutoModelForMaskedLM.from_pretrained('IDEA-CCNL/Erlangshen-SimCSE-110M-Chinese')
50
+ tokenizer = AutoTokenizer.from_pretrained('IDEA-CCNL/Erlangshen-SimCSE-110M-Chinese')
 
 
 
51
  ```
52
 
53
  ### 使用示例 Usage Examples
54
 
55
  ```python
56
+ from sklearn.metrics.pairwise import cosine_similarity
57
+
58
+ texta = '今天天气真不错,我们去散步吧!'
59
+ textb = '今天天气真糟糕,还是在宅家里写bug吧!'
60
+ inputs_a = tokenizer(texta,return_tensors="pt")
61
+ inputs_b = tokenizer(textb,return_tensors="pt")
62
+
63
+ outputs_a = model(**inputs_a ,output_hidden_states=True)
64
+ texta_embedding = outputs_a.hidden_states[-1][:,0,:].squeeze()
65
+
66
+ outputs_b = model(**inputs_b ,output_hidden_states=True)
67
+ textb_embedding = outputs_b.hidden_states[-1][:,0,:].squeeze()
68
+
69
+ # if you use cuda, the text_embedding should be textb_embedding.cpu().numpy()
70
+ silimarity_soce = cosine_similarity(texta_embedding.reshape(1,-1),textb_embedding .reshape(1,-1))[0][0]
71
+
72
  ```
73
 
74
  ## 引用 Citation