Improve model card and add missing information

#1
by nielsr HF staff - opened
Files changed (1) hide show
  1. README.md +68 -152
README.md CHANGED
@@ -1,229 +1,145 @@
1
  ---
2
  base_model: google/gemma-2-9b-it
3
- library_name: peft
4
- license: cc-by-nc-4.0
5
  language:
6
  - uk
 
 
 
7
  ---
8
 
9
- # Model Card for Model ID
10
-
11
- <!-- Provide a quick summary of what the model is/does. -->
12
-
13
- Presented in [Empowering Smaller Models: Tuning LLaMA and Gemma with Chain-of-Thought for Ukrainian Exam Tasks (arXiv:2503.13988)](https://arxiv.org/abs/2503.13988)
14
-
15
- PEFT 4bit tuning of `google/gemma-2-9b-it` on Ukrainian language and literature tasks of ZNO (EIE) & NMT dataset to generate correct answer letter:
16
-
17
- ```
18
- <bos><start_of_turn>user
19
- Дайте розгорнуту відповідь на завдання, починаючи з ключового слова "Відповідь:" та використовуючи лише наведені нижче варіанти.
20
-
21
- Завдання: З’ясуйте, якими частинами мови є виділені слова в реченні (цифра позначає наступне слово).
22
- Сучасна людина, щоб бути (1)успішною, має вчитися (2)впродовж (3)усього життя, (4)опановуючи нові галузі знань.
23
-
24
- Варіанти відповіді:
25
- А – займенник
26
- Б – прикметник
27
- В – форма дієслова (дієприкметник)
28
- Г – форма дієслова (дієприслівник)
29
- Д – прийменник<end_of_turn>
30
- <start_of_turn>model
31
- Відповідь:
32
- 1 - В
33
- 2 - Д
34
- 3 - А
35
- 4 - Г<end_of_turn>
36
- ```
37
 
 
38
 
39
  ## Model Details
40
 
41
- ### Model Description
42
-
43
- <!-- Provide a longer summary of what this model is. -->
44
 
 
 
 
 
45
 
 
46
 
47
- - **Developed by:** [More Information Needed]
48
- - **Funded by [optional]:** [More Information Needed]
49
- - **Shared by [optional]:** [More Information Needed]
50
- - **Model type:** [More Information Needed]
51
- - **Language(s) (NLP):** [More Information Needed]
52
- - **License:** [More Information Needed]
53
- - **Finetuned from model [optional]:** [More Information Needed]
54
 
55
- ### Model Sources [optional]
56
-
57
- <!-- Provide the basic links for the model. -->
58
-
59
- - **Repository:** [More Information Needed]
60
- - **Paper [optional]:** [More Information Needed]
61
- - **Demo [optional]:** [More Information Needed]
62
 
63
  ## Uses
64
 
65
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
66
-
67
  ### Direct Use
68
 
69
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
70
-
71
- [More Information Needed]
72
 
73
- ### Downstream Use [optional]
74
 
75
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
76
 
77
- [More Information Needed]
78
 
79
  ### Out-of-Scope Use
80
 
81
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
82
-
83
- [More Information Needed]
84
 
85
  ## Bias, Risks, and Limitations
86
 
87
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
88
-
89
- [More Information Needed]
90
 
91
  ### Recommendations
92
 
93
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
94
 
95
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
96
 
97
  ## How to Get Started with the Model
98
 
99
- Use the code below to get started with the model.
 
 
100
 
101
- [More Information Needed]
102
 
103
- ## Training Details
 
104
 
105
- ### Training Data
 
106
 
107
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
108
 
109
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
110
 
111
- ### Training Procedure
112
 
113
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
114
 
115
- #### Preprocessing [optional]
116
 
117
- [More Information Needed]
118
 
 
119
 
120
- #### Training Hyperparameters
121
 
122
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
123
 
124
- #### Speeds, Sizes, Times [optional]
125
 
126
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
127
 
128
- [More Information Needed]
129
 
130
  ## Evaluation
131
 
132
- <!-- This section describes the evaluation protocols and provides the results. -->
133
-
134
- ### Testing Data, Factors & Metrics
135
-
136
- #### Testing Data
137
-
138
- <!-- This should link to a Dataset Card if possible. -->
139
-
140
- [More Information Needed]
141
-
142
- #### Factors
143
-
144
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
145
-
146
- [More Information Needed]
147
-
148
- #### Metrics
149
-
150
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
151
-
152
- [More Information Needed]
153
-
154
- ### Results
155
-
156
- [More Information Needed]
157
-
158
- #### Summary
159
 
160
 
161
-
162
- ## Model Examination [optional]
163
-
164
- <!-- Relevant interpretability work for the model goes here -->
165
-
166
- [More Information Needed]
167
-
168
  ## Environmental Impact
169
 
170
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
171
 
172
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
173
 
174
- - **Hardware Type:** [More Information Needed]
175
- - **Hours used:** [More Information Needed]
176
- - **Cloud Provider:** [More Information Needed]
177
- - **Compute Region:** [More Information Needed]
178
- - **Carbon Emitted:** [More Information Needed]
179
-
180
- ## Technical Specifications [optional]
181
 
182
  ### Model Architecture and Objective
183
 
184
- [More Information Needed]
185
-
186
- ### Compute Infrastructure
187
-
188
- [More Information Needed]
189
 
190
- #### Hardware
191
 
192
- [More Information Needed]
193
-
194
- #### Software
195
-
196
- [More Information Needed]
197
-
198
- ## Citation [optional]
199
-
200
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
201
-
202
- **BibTeX:**
203
 
204
- [More Information Needed]
205
 
206
- **APA:**
207
 
208
- [More Information Needed]
209
 
210
- ## Glossary [optional]
211
 
212
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
213
 
214
- [More Information Needed]
215
 
216
- ## More Information [optional]
217
 
218
- [More Information Needed]
219
 
220
- ## Model Card Authors [optional]
221
 
222
- [More Information Needed]
223
 
224
  ## Model Card Contact
225
 
226
- [More Information Needed]
227
- ### Framework versions
228
-
229
- - PEFT 0.14.0
 
1
  ---
2
  base_model: google/gemma-2-9b-it
 
 
3
  language:
4
  - uk
5
+ library_name: peft
6
+ license: cc-by-nc-4.0
7
+ pipeline_tag: text-generation
8
  ---
9
 
10
+ # Empowering Smaller Models: Tuning Gemma for Ukrainian Exam Tasks
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
+ This model, presented in [Empowering Smaller Models: Tuning LLaMA and Gemma with Chain-of-Thought for Ukrainian Exam Tasks (arXiv:2503.13988)](https://arxiv.org/abs/2503.13988), is a 4-bit quantized PEFT adaptation of `google/gemma-2-9b-it` fine-tuned on Ukrainian language and literature exam tasks from the ZNO (EIE) & NMT datasets. It's designed to generate the correct answer letter for multiple-choice questions. The model takes a prompt consisting of the question and answer choices and outputs the letter corresponding to the correct answer.
13
 
14
  ## Model Details
15
 
16
+ This model was developed using the PEFT library for efficient parameter-efficient fine-tuning.
 
 
17
 
18
+ - **Model type:** Causal Language Model
19
+ - **Language(s) (NLP):** Ukrainian (uk)
20
+ - **License:** CC-BY-NC-4.0
21
+ - **Finetuned from model:** `google/gemma-2-9b-it`
22
 
23
+ ### Model Sources
24
 
25
+ - **Repository:** https://github.com/AndriyAntypenko/UKR-GEC-LLM
26
+ - **Paper:** [Empowering Smaller Models: Tuning LLaMA and Gemma with Chain-of-Thought for Ukrainian Exam Tasks (arXiv:2503.13988)](https://arxiv.org/abs/2503.13988)
 
 
 
 
 
27
 
 
 
 
 
 
 
 
28
 
29
  ## Uses
30
 
 
 
31
  ### Direct Use
32
 
33
+ The model can be used directly for generating the letter corresponding to the correct answer for Ukrainian language and literature exam questions, formatted as shown in the example below.
 
 
34
 
35
+ ### Downstream Use
36
 
37
+ This model could be integrated into educational applications or question-answering systems focused on Ukrainian language and literature.
38
 
 
39
 
40
  ### Out-of-Scope Use
41
 
42
+ This model is specifically trained for Ukrainian exam tasks and should not be used for other tasks or languages. Its performance on other domains is not guaranteed and may be unreliable.
 
 
43
 
44
  ## Bias, Risks, and Limitations
45
 
46
+ The model's performance is heavily dependent on the quality and characteristics of the training data (ZNO (EIE) & NMT datasets). Biases present in this data may be reflected in the model's output. The model's accuracy is limited to the specific types of questions present in the training data. Over-reliance on this model for high-stakes decisions without proper human oversight is strongly discouraged.
 
 
47
 
48
  ### Recommendations
49
 
50
+ Users should be cautious about the model's potential biases and limitations. Human review of the model's output is crucial, especially for high-stakes applications. Further research is needed to fully understand and mitigate potential biases.
51
 
 
52
 
53
  ## How to Get Started with the Model
54
 
55
+ ```python
56
+ from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
57
+ import torch
58
 
59
+ model_id = "ybelonogov/gemma-zno-eie" # Replace with the actual Hugging Face model ID
60
 
61
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
62
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto", trust_remote_code=True)
63
 
64
+ prompt = """<bos><start_of_turn>user
65
+ Дайте розгорнуту відповідь на завдання, починаючи з ключового слова "Відповідь:" та використовуючи лише наведені нижче варіанти.
66
 
67
+ Завдання: ... [Your Question Here] ...
68
 
69
+ Варіанти відповіді:
70
+ А – ...
71
+ Б – ...
72
+ В – ...
73
+ Г – ...
74
+ Д – ...<end_of_turn>"""
75
+
76
+ inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
77
+ generation_config = GenerationConfig(
78
+ temperature=0.1,
79
+ max_new_tokens=512,
80
+ do_sample=False
81
+ )
82
+
83
+ outputs = model.generate(inputs["input_ids"], **generation_config.to_dict())
84
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
85
+ print(generated_text)
86
 
87
+ ```
88
 
89
+ ## Training Details
90
 
91
+ ### Training Data
92
 
93
+ [More information needed - link to dataset card or description of ZNO (EIE) & NMT datasets]
94
 
95
+ ### Training Procedure
96
 
97
+ The model was fine-tuned using the PEFT library's QLoRA technique. Further details about the training hyperparameters are required.
98
 
 
99
 
100
+ #### Training Hyperparameters
101
 
102
+ - **Training regime:** 4-bit quantization
103
 
 
104
 
105
  ## Evaluation
106
 
107
+ [More information needed on evaluation metrics, datasets, etc.]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
 
109
 
 
 
 
 
 
 
 
110
  ## Environmental Impact
111
 
112
+ [More information needed]
113
 
 
114
 
115
+ ## Technical Specifications
 
 
 
 
 
 
116
 
117
  ### Model Architecture and Objective
118
 
119
+ [More information needed]
 
 
 
 
120
 
 
121
 
122
+ ### Compute Infrastructure
 
 
 
 
 
 
 
 
 
 
123
 
124
+ [More information needed]
125
 
 
126
 
127
+ ## Citation
128
 
129
+ [More information about citation details needed]
130
 
131
+ ## Glossary
132
 
133
+ [Add glossary if needed]
134
 
135
+ ## More Information
136
 
137
+ [Add more information if needed]
138
 
139
+ ## Model Card Authors
140
 
141
+ [Add author information]
142
 
143
  ## Model Card Contact
144
 
145
+ [Add contact information]