ShengbinYue commited on
Commit
abc510f
·
verified ·
1 Parent(s): a87e7d3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +118 -1
README.md CHANGED
@@ -8,4 +8,121 @@ base_model:
8
  - Qwen/Qwen2.5-7B-Instruct
9
  tags:
10
  - legal
11
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - Qwen/Qwen2.5-7B-Instruct
9
  tags:
10
  - legal
11
+ ---
12
+
13
+ This repository contains the DISC-LawLLM-7B, version of Qwen2.5-7b-instuct as the base model.
14
+
15
+ <div align="center">
16
+
17
+ [Paper](https://link.springer.com/chapter/10.1007/978-981-97-5569-1_19) | [Technical Report](https://arxiv.org/abs/2309.11325)
18
+ </div>
19
+
20
+ DISC-LawLLM is a large language model specialized in Chinese legal domain, developed and open-sourced by [Data Intelligence and Social Computing Lab of Fudan University (Fudan-DISC)](http://fudan-disc.com), to provide comprehensive intelligent legal services.
21
+
22
+ Check our [HOME](https://github.com/FudanDISC/DISC-LawLLM) for more information.
23
+
24
+ # Quickstart
25
+
26
+ We advise you to install transformers>=4.37.0.
27
+
28
+ ```python
29
+ from transformers import AutoModelForCausalLM, AutoTokenizer
30
+ device = "cuda" # the device to load the model onto
31
+
32
+ model = AutoModelForCausalLM.from_pretrained(
33
+ "ShengbinYue/LawLLM-7B",
34
+ torch_dtype="auto",
35
+ device_map="auto"
36
+ )
37
+ tokenizer = AutoTokenizer.from_pretrained("ShengbinYue/LawLLM-7B")
38
+
39
+ prompt = "生产销售假冒伪劣商品罪如何判刑?"
40
+ messages = [
41
+ {"role": "system", "content": "你是LawLLM,一个由复旦大学DISC实验室创造的法律助手。"},
42
+ {"role": "user", "content": prompt}
43
+ ]
44
+ text = tokenizer.apply_chat_template(
45
+ messages,
46
+ tokenize=False,
47
+ add_generation_prompt=True
48
+ )
49
+ model_inputs = tokenizer([text], return_tensors="pt").to(device)
50
+
51
+ generated_ids = model.generate(
52
+ model_inputs.input_ids,
53
+ max_new_tokens=512
54
+ )
55
+ generated_ids = [
56
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
57
+ ]
58
+
59
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
60
+ ```
61
+
62
+ # VLLM
63
+
64
+ - Install vLLM
65
+ pip install "vllm>=0.4.3"
66
+
67
+
68
+ - Run LawLLM
69
+
70
+ ```python
71
+ from transformers import AutoModelForCausalLM, AutoTokenizer
72
+ from vllm import LLM, SamplingParams
73
+
74
+ model_name ='ShengbinYue/LawLLM-7B'
75
+
76
+ sampling_params = SamplingParams(
77
+ temperature=0.1,
78
+ top_p=0.9,
79
+ top_k=50,
80
+ max_tokens=4096
81
+ )
82
+ llm = LLM(model=model_name)
83
+
84
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
85
+ prompt = "生产销售假冒伪劣商品罪如何判刑?"
86
+
87
+ # prompt = "戴罪立功是什么意思"
88
+ messages = [
89
+ {"role": "system", "content": "你是LawLLM,一个由复旦大学DISC实验室创造的法律助手。"},
90
+ {"role": "user", "content": prompt}
91
+ ]
92
+ text = tokenizer.apply_chat_template(
93
+ messages,
94
+ tokenize=False,
95
+ add_generation_prompt=True
96
+ )
97
+
98
+ outputs = llm.generate([text], sampling_params)
99
+ for output in outputs:
100
+ prompt = output.prompt
101
+ generated_text = output.outputs[0].text
102
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
103
+ ```
104
+
105
+ # Citation
106
+
107
+ If our work is helpful to you, please kindly cite our work as follows:
108
+
109
+ ```
110
+ @misc{yue2023disclawllm,
111
+ title={DISC-LawLLM: Fine-tuning Large Language Models for Intelligent Legal Services},
112
+ author={Shengbin Yue and Wei Chen and Siyuan Wang and Bingxuan Li and Chenchen Shen and Shujun Liu and Yuxuan Zhou and Yao Xiao and Song Yun and Wei Lin and Xuanjing Huang and Zhongyu Wei},
113
+ year={2023},
114
+ eprint={2309.11325},
115
+ archivePrefix={arXiv},
116
+ primaryClass={cs.CL}
117
+ }
118
+
119
+ @inproceedings{yue2024lawllm,
120
+ title={LawLLM: Intelligent Legal System with Legal Reasoning and Verifiable Retrieval},
121
+ author={Yue, Shengbin and Liu, Shujun and Zhou, Yuxuan and Shen, Chenchen and Wang, Siyuan and Xiao, Yao and Li, Bingxuan and Song, Yun and Shen, Xiaoyu and Chen, Wei and others},
122
+ booktitle={International Conference on Database Systems for Advanced Applications},
123
+ pages={304--321},
124
+ year={2024},
125
+ organization={Springer}
126
+ }
127
+ ```
128
+