shenzhi-wang commited on
Commit
6dd9a2c
·
verified ·
1 Parent(s): f719d6d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -3
README.md CHANGED
@@ -1,3 +1,70 @@
1
- ---
2
- license: llama3.1
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3.1
3
+ ---
4
+
5
+ # Model Summary
6
+
7
+ llama3.1-8B-Chinese-Chat is an instruction-tuned language model for Chinese & English users with various abilities such as roleplaying & tool-using built upon the Meta-Llama-3.1-8B-Instruct model.
8
+
9
+ Developers: [Shenzhi Wang](https://shenzhi-wang.netlify.app)\*, [Yaowei Zheng](https://github.com/hiyouga)\*, Guoyin Wang (in.ai), Shiji Song, Gao Huang. (\*: Equal Contribution)
10
+
11
+ - License: [Llama-3.1 License](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/blob/main/LICENSE)
12
+ - Base Model: Meta-Llama-3.1-8B-Instruct
13
+ - Model Size: 8.03B
14
+ - Context length: 8K
15
+
16
+ # 1. Introduction
17
+
18
+ This is the first model specifically fine-tuned for Chinese & English user through ORPO [1] based on the [Meta-Llama-3.1-8B-Instruct model](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
19
+
20
+ **Compared to the original [Meta-Llama-3.1-8B-Instruct model](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct), our llama3.1-8B-Chinese-Chat model significantly reduces the issues of "Chinese questions with English answers" and the mixing of Chinese and English in responses.**
21
+
22
+
23
+ [1] Hong, Jiwoo, Noah Lee, and James Thorne. "Reference-free Monolithic Preference Optimization with Odds Ratio." arXiv preprint arXiv:2403.07691 (2024).
24
+
25
+ Training framework: [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory).
26
+
27
+ Training details:
28
+
29
+ - epochs: 3
30
+ - learning rate: 3e-6
31
+ - learning rate scheduler type: cosine
32
+ - Warmup ratio: 0.1
33
+ - cutoff len (i.e. context length): 8192
34
+ - orpo beta (i.e. $\lambda$ in the ORPO paper): 0.05
35
+ - global batch size: 128
36
+ - fine-tuning type: full parameters
37
+ - optimizer: paged_adamw_32bit
38
+
39
+
40
+
41
+ # 2. Usage
42
+
43
+ ```python
44
+ from transformers import AutoTokenizer, AutoModelForCausalLM
45
+
46
+ model_id = "shenzhi-wang/Llama3.1-8B-Chinese-Chat"
47
+
48
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
49
+ model = AutoModelForCausalLM.from_pretrained(
50
+ model_id, torch_dtype="auto", device_map="auto"
51
+ )
52
+
53
+ messages = [
54
+ {"role": "user", "content": "写一首诗吧"},
55
+ ]
56
+
57
+ input_ids = tokenizer.apply_chat_template(
58
+ messages, add_generation_prompt=True, return_tensors="pt"
59
+ ).to(model.device)
60
+
61
+ outputs = model.generate(
62
+ input_ids,
63
+ max_new_tokens=8192,
64
+ do_sample=True,
65
+ temperature=0.6,
66
+ top_p=0.9,
67
+ )
68
+ response = outputs[0][input_ids.shape[-1]:]
69
+ print(tokenizer.decode(response, skip_special_tokens=True))
70
+ ```