lbourdois commited on
Commit
a43a723
·
verified ·
1 Parent(s): f98d457

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +160 -148
README.md CHANGED
@@ -1,148 +1,160 @@
1
- ---
2
- base_model: Qwen/Qwen2.5-7B-Instruct
3
- language:
4
- - zh
5
- license: apache-2.0
6
- license_link: https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/blob/main/LICENSE
7
- pipeline_tag: text-generation
8
- tags:
9
- - pytorch
10
- - Qwen
11
- - Qwen2.5
12
- - ContaLLM
13
- - ContaAI
14
- library_name: transformers
15
- ---
16
-
17
- <img src="https://conta-ai-image.oss-cn-shanghai.aliyuncs.com/contaai/logo2.png" alt="ContaLLM" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
18
-
19
- # ContaLLM-Fashion-7B-Instruct
20
-
21
- ContaLLM-Fashion-7B-Instruct is a large-scale Chinese vertical marketing mode focusing on the fashion industry. You can customize and generate marketing texts according to users' specific marketing needs, brand, selection, content type, article length, topic, selling point, hashtag, scene, etc. Use the LLM's capabilities and training on existing high-quality marketing materials to help companies generate diversified, high-quality marketing content and improve marketing conversion rates.
22
-
23
- ## Model description
24
-
25
- - **Model type:** A model trained on a mix of publicly available, synthetic and human-annotated datasets.
26
- - **Language(s) (NLP):** Primarily Chinese
27
- - **Industry:** Fashion Makeup Industry Marketing
28
- - **License:** apache-2.0
29
- - **Finetuned from model:** Qwen/Qwen2.5-7B-Instruct
30
-
31
- ### Model Stage
32
-
33
- | **Industry** | **Version** | **Qwen 2.5 7B**
34
- |--------------|-------------|------------------------------------------------------------------------------------------------------------|
35
- | **Fashion** | **bf16** | [ContaAI/ContaLLM-Fashion-7B-Instruct](https://huggingface.co/ContaAI/ContaLLM-Fashion-7B-Instruct) |
36
- | **Fashion** | **8bit** | [ContaAI/ContaLLM-Fashion-7B-Instruct-8bit](https://huggingface.co/ContaAI/ContaLLM-Fashion-7B-Instruct-8bit) |
37
- | **Fashion** | **4bit** | [ContaAI/ContaLLM-Fashion-7B-Instruct-4bit](https://huggingface.co/ContaAI/ContaLLM-Fashion-7B-Instruct-4bit) |
38
-
39
- ## Using the model
40
-
41
- ### Loading with HuggingFace
42
-
43
- To load the model with HuggingFace, use the following snippet:
44
- ```
45
- from transformers import AutoModelForCausalLM
46
-
47
- model = AutoModelForCausalLM.from_pretrained("ContaAI/ContaLLM-Fashion-7B-Instruct")
48
- ```
49
-
50
-
51
- ### System Prompt
52
-
53
- The model is a Chinese beauty marketing model, so we use this system prompt by default:
54
- ```
55
- system_prompt = '请根据用户提供的营销需求和其他信息写一篇时尚行业的营销推文。'
56
- ```
57
-
58
- ### User Prompt
59
- Users can enter the required marketing needs according to their own needs, non-required including brand, product selection, content type, topics, selling point, hashtag, scenes, content length, which content length has three specifications, respectively, shorter, medium, longer. The details are as follows:
60
-
61
- | Parameter name | Required | Meaning and optional range |
62
- |-------------------|-----------------------|------------------------------------------------------------------------------------------------------|
63
- | **营销需求** | required | Fill in your marketing requirements, cannot be blank |
64
- | **品牌** | optional | Fill in your marketing brand, or remove this row from the prompt |
65
- | **选品** | optional | Fill in your product selection, or remove this row from the prompt |
66
- | **内容类型** | optional | Fill in the article type, or remove this row from the prompt |
67
- | **内容长度** | optional | choices=['较长', '中等', '较短'], choose what you need, or remove this row from the prompt |
68
- | **话题** | optional | Fill in your marketing topic, or remove this row from the prompt |
69
- | **卖点** | optional | Fill in the selling point for your marketing needs, or remove this row from the prompt |
70
- | **标签** | optional | Fill in the hashtag, or remove this row from the prompt |
71
- | **场景** | optional | Fill in the scenes for your marketing needs, or remove this row from the prompt |
72
-
73
- Example:
74
- ```
75
- user_prompt = """营销需求:秋冬大包包推荐
76
- 品牌:Celine
77
- 选品:CELINE托特包
78
- 内容类型:产品种草与测评
79
- 内容长度:较短
80
- 话题:CELINE托特包、秋冬大包包、托特包用途
81
- 卖点:慵懒设计、大容量、新款限定设计
82
- 标签:CELINE、托特包、新品
83
- 场景:日常通勤、妈咪包使用、秋冬搭配"""
84
- ```
85
-
86
- ### Use example (with template)
87
- ```
88
- import torch
89
- from transformers import AutoModelForCausalLM, AutoTokenizer
90
- model_name = "ContaAI/ContaLLM-Fashion-7B-Instruct"
91
- model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
92
- tokenizer = AutoTokenizer.from_pretrained(model_name)
93
-
94
- system_prompt = '请根据用户提供的营销需求和其他信息写一篇时尚行业的营销推文。'
95
-
96
- user_prompt = """营销需求:秋冬大包包推荐
97
- 品牌:Celine
98
- 选品:CELINE托特包
99
- 内容类型:产品种草与测评
100
- 内容长度:较短
101
- 话题:CELINE托特包、秋冬大包包、托特包用途
102
- 卖点:慵懒设计、大容量、新款限定设计
103
- 标签:CELINE、托特包、新品
104
- 场景:日常通勤、妈咪包使用、秋冬搭配"""
105
-
106
- prompt_template = '''<|im_start|>system
107
- {}<|im_end|>
108
- <|im_start|>user
109
- {}<|im_end|>
110
- <|im_start|>assistant
111
- '''
112
-
113
- prompt = prompt_template.format(system_prompt, user_prompt)
114
-
115
- tokenized_message = tokenizer(
116
- prompt,
117
- max_length=1024,
118
- return_tensors="pt",
119
- add_special_tokens=False
120
- )
121
-
122
- response_token_ids= model.generate(
123
- **tokenized_message,
124
- max_new_tokens=1024,
125
- do_sample=True,
126
- top_p=1.0,
127
- temperature=0.5,
128
- min_length=None,
129
- use_cache=True,
130
- top_k=50,
131
- repetition_penalty=1.2,
132
- length_penalty=1,
133
- )
134
-
135
- generated_tokens = response_token_ids[0, tokenized_message['input_ids'].shape[-1]:]
136
- generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
137
- print(generated_text)
138
- ```
139
-
140
- ### Bias, Risks, and Limitations
141
-
142
- The ContaLLM models implemented safety techniques during data generation and training, but they are not deployed automatically with in-the-loop filtering of responses like ChatGPT during inference, so the model can produce problematic outputs (especially when prompted to do so).
143
- It is also unknown what the size and composition of the corpus was used to train the base Qwen2.5 models, however it is likely to have included a mix of Web data and technical sources like books and code.
144
- The use of the models is at your own risk. You may need to monitor the outputs of the model and take appropriate actions such as content filtering if necessary.
145
-
146
- ## License and use
147
-
148
- All Qwen 2.5 ContaAI models are released under Qwen's [Qwen 2.5 Community License Agreement](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/blob/main/LICENSE).
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen2.5-7B-Instruct
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ license: apache-2.0
18
+ license_link: https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/blob/main/LICENSE
19
+ pipeline_tag: text-generation
20
+ tags:
21
+ - pytorch
22
+ - Qwen
23
+ - Qwen2.5
24
+ - ContaLLM
25
+ - ContaAI
26
+ library_name: transformers
27
+ ---
28
+
29
+ <img src="https://conta-ai-image.oss-cn-shanghai.aliyuncs.com/contaai/logo2.png" alt="ContaLLM" width="800" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
30
+
31
+ # ContaLLM-Fashion-7B-Instruct
32
+
33
+ ContaLLM-Fashion-7B-Instruct is a large-scale Chinese vertical marketing mode focusing on the fashion industry. You can customize and generate marketing texts according to users' specific marketing needs, brand, selection, content type, article length, topic, selling point, hashtag, scene, etc. Use the LLM's capabilities and training on existing high-quality marketing materials to help companies generate diversified, high-quality marketing content and improve marketing conversion rates.
34
+
35
+ ## Model description
36
+
37
+ - **Model type:** A model trained on a mix of publicly available, synthetic and human-annotated datasets.
38
+ - **Language(s) (NLP):** Primarily Chinese
39
+ - **Industry:** Fashion Makeup Industry Marketing
40
+ - **License:** apache-2.0
41
+ - **Finetuned from model:** Qwen/Qwen2.5-7B-Instruct
42
+
43
+ ### Model Stage
44
+
45
+ | **Industry** | **Version** | **Qwen 2.5 7B**
46
+ |--------------|-------------|------------------------------------------------------------------------------------------------------------|
47
+ | **Fashion** | **bf16** | [ContaAI/ContaLLM-Fashion-7B-Instruct](https://huggingface.co/ContaAI/ContaLLM-Fashion-7B-Instruct) |
48
+ | **Fashion** | **8bit** | [ContaAI/ContaLLM-Fashion-7B-Instruct-8bit](https://huggingface.co/ContaAI/ContaLLM-Fashion-7B-Instruct-8bit) |
49
+ | **Fashion** | **4bit** | [ContaAI/ContaLLM-Fashion-7B-Instruct-4bit](https://huggingface.co/ContaAI/ContaLLM-Fashion-7B-Instruct-4bit) |
50
+
51
+ ## Using the model
52
+
53
+ ### Loading with HuggingFace
54
+
55
+ To load the model with HuggingFace, use the following snippet:
56
+ ```
57
+ from transformers import AutoModelForCausalLM
58
+
59
+ model = AutoModelForCausalLM.from_pretrained("ContaAI/ContaLLM-Fashion-7B-Instruct")
60
+ ```
61
+
62
+
63
+ ### System Prompt
64
+
65
+ The model is a Chinese beauty marketing model, so we use this system prompt by default:
66
+ ```
67
+ system_prompt = '请根据用户提供的营销需求和其他信息写一篇时尚行业的营销推文。'
68
+ ```
69
+
70
+ ### User Prompt
71
+ Users can enter the required marketing needs according to their own needs, non-required including brand, product selection, content type, topics, selling point, hashtag, scenes, content length, which content length has three specifications, respectively, shorter, medium, longer. The details are as follows:
72
+
73
+ | Parameter name | Required | Meaning and optional range |
74
+ |-------------------|-----------------------|------------------------------------------------------------------------------------------------------|
75
+ | **营销需求** | required | Fill in your marketing requirements, cannot be blank |
76
+ | **品牌** | optional | Fill in your marketing brand, or remove this row from the prompt |
77
+ | **选品** | optional | Fill in your product selection, or remove this row from the prompt |
78
+ | **内容类型** | optional | Fill in the article type, or remove this row from the prompt |
79
+ | **内容长度** | optional | choices=['较长', '中等', '较短'], choose what you need, or remove this row from the prompt |
80
+ | **话题** | optional | Fill in your marketing topic, or remove this row from the prompt |
81
+ | **卖点** | optional | Fill in the selling point for your marketing needs, or remove this row from the prompt |
82
+ | **标签** | optional | Fill in the hashtag, or remove this row from the prompt |
83
+ | **场景** | optional | Fill in the scenes for your marketing needs, or remove this row from the prompt |
84
+
85
+ Example:
86
+ ```
87
+ user_prompt = """营销需求:秋冬大包包推荐
88
+ 品牌:Celine
89
+ 选品:CELINE托特包
90
+ 内容类型:产品种草与测评
91
+ 内容长度:较短
92
+ 话题:CELINE托特包、秋冬大包包、托特包用途
93
+ 卖点:慵懒设计、大容量、新款限定设计
94
+ 标签:CELINE、托特包、新品
95
+ 场景:日常通勤、妈咪包使用、秋冬搭配"""
96
+ ```
97
+
98
+ ### Use example (with template)
99
+ ```
100
+ import torch
101
+ from transformers import AutoModelForCausalLM, AutoTokenizer
102
+ model_name = "ContaAI/ContaLLM-Fashion-7B-Instruct"
103
+ model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
104
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
105
+
106
+ system_prompt = '请根据用户提供的营销需求和其他信息写一篇时尚行业的营销推文。'
107
+
108
+ user_prompt = """营销需求:秋冬大包包推荐
109
+ 品牌:Celine
110
+ 选品:CELINE托特包
111
+ 内容类型:产品种草与测评
112
+ 内容长度:较短
113
+ 话题:CELINE托特包、秋冬大包包、托特包用途
114
+ 卖点:慵懒设计、大容量、新款限定设计
115
+ 标签:CELINE、托特包、新品
116
+ 场景:日常通勤、妈咪包使用、秋冬搭配"""
117
+
118
+ prompt_template = '''<|im_start|>system
119
+ {}<|im_end|>
120
+ <|im_start|>user
121
+ {}<|im_end|>
122
+ <|im_start|>assistant
123
+ '''
124
+
125
+ prompt = prompt_template.format(system_prompt, user_prompt)
126
+
127
+ tokenized_message = tokenizer(
128
+ prompt,
129
+ max_length=1024,
130
+ return_tensors="pt",
131
+ add_special_tokens=False
132
+ )
133
+
134
+ response_token_ids= model.generate(
135
+ **tokenized_message,
136
+ max_new_tokens=1024,
137
+ do_sample=True,
138
+ top_p=1.0,
139
+ temperature=0.5,
140
+ min_length=None,
141
+ use_cache=True,
142
+ top_k=50,
143
+ repetition_penalty=1.2,
144
+ length_penalty=1,
145
+ )
146
+
147
+ generated_tokens = response_token_ids[0, tokenized_message['input_ids'].shape[-1]:]
148
+ generated_text = tokenizer.decode(generated_tokens, skip_special_tokens=True)
149
+ print(generated_text)
150
+ ```
151
+
152
+ ### Bias, Risks, and Limitations
153
+
154
+ The ContaLLM models implemented safety techniques during data generation and training, but they are not deployed automatically with in-the-loop filtering of responses like ChatGPT during inference, so the model can produce problematic outputs (especially when prompted to do so).
155
+ It is also unknown what the size and composition of the corpus was used to train the base Qwen2.5 models, however it is likely to have included a mix of Web data and technical sources like books and code.
156
+ The use of the models is at your own risk. You may need to monitor the outputs of the model and take appropriate actions such as content filtering if necessary.
157
+
158
+ ## License and use
159
+
160
+ All Qwen 2.5 ContaAI models are released under Qwen's [Qwen 2.5 Community License Agreement](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct/blob/main/LICENSE).