amjad-awad commited on
Commit
1de17c1
·
verified ·
1 Parent(s): cae8335

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +136 -7
README.md CHANGED
@@ -1,22 +1,151 @@
1
  ---
2
  base_model: unsloth/mistral-7b-instruct-v0.2-bnb-4bit
3
  tags:
4
- - text-generation-inference
5
  - transformers
6
  - unsloth
7
  - mistral
8
  - trl
 
 
9
  license: apache-2.0
10
  language:
11
  - en
 
 
 
 
 
 
 
12
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
- # Uploaded model
15
 
16
- - **Developed by:** amjad-awad
17
- - **License:** apache-2.0
18
- - **Finetuned from model :** unsloth/mistral-7b-instruct-v0.2-bnb-4bit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
- This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
 
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
 
 
 
1
  ---
2
  base_model: unsloth/mistral-7b-instruct-v0.2-bnb-4bit
3
  tags:
 
4
  - transformers
5
  - unsloth
6
  - mistral
7
  - trl
8
+ - text-generation
9
+ - text-generation-inference
10
  license: apache-2.0
11
  language:
12
  - en
13
+ metrics:
14
+ - accuracy
15
+ - bertscore
16
+ - f1
17
+ - recall
18
+ - precision
19
+ library_name: transformers
20
  ---
21
+ # Mistral 7b instruct
22
+ This model is a fine-tuned version of [mistral-7b-instruct-v0.2-bnb-4bit](https://huggingface.co/unsloth/mistral-7b-instruct-v0.2-bnb-4bit) on the EngSaf dataset for Automatic Essay Grading.
23
+
24
+ Robust performance on tasks involving Automatic Essay Grading to give a score and rationale
25
+
26
+ It achieves the following results on the evaluation set:
27
+
28
+ - Loss: 1.3612
29
+ - Score Precision: 0.333
30
+ - Score Recall: 0.308
31
+ - Score F1: 0.3106
32
+ - Score Accuracy: 0.311
33
+ -------------------------------
34
+ - Rationale Precision: 0.546
35
+ - Rationale Recall: 0.549
36
+ - Rationale F1: 0.5467
37
+
38
+ ## Model Details
39
+ - Base Model: Mistral 7B: https://arxiv.org/abs/2310.06825
40
+ - Fine-tuning Dataset: EngSaf: https://arxiv.org/abs/2407.12818.
41
+ - Task: Automatic Essay Grading
42
+
43
+ ## Training Data
44
+ The model is fine-tuned in the EngSaf dataset, curated for Automatic Essay Grading.
45
+ EngSaf consists of student responses annotated with
46
+
47
+ - Questions: Typically short-answer or essay-type.
48
+ - Correct Answer: answers provided by teachers.
49
+ - Student Answers: Actual responses written by students.
50
+ - Output Label: The actual student score.
51
+ - Feedback: Explanations justifying the given scores.
52
+
53
+ ## Example Usage
54
+ Below is an example of how to use the model with the Hugging Face Transformers library:
55
+
56
+ ```python
57
+ import torch
58
+ from unsloth import FastLanguageModel
59
+ from transformers import AutoModelForCausalLM, AutoTokenizer
60
+
61
+ model, tokenizer = FastLanguageModel.from_pretrained(model_name="amjad-awad/mistral-7b-instruct-v0.2-bnb-4bit-EngSaf-96K-lr-1e5",max_seq_length=2048,load_in_4bit=True)
62
+
63
+ model = FastLanguageModel.get_peft_model(
64
+ model,
65
+ r=16,
66
+ target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
67
+ lora_alpha=16,
68
+ lora_dropout=0,
69
+ bias="none",
70
+ use_gradient_checkpointing=True,
71
+ random_state=3407,
72
+ )
73
+
74
+ user_content = (
75
+ "Provide both a score and a rationale by evaluating the student's answer strictly within the mark scheme range, "
76
+ "grading based on how well it meets the question's requirements by comparing the student answer to the reference answer.\n"
77
+ "Question: What is photosynthesis?\n"
78
+ "Reference Answer: Photosynthesis is the process by which green plants and some other organisms use sunlight to synthesize nutrients from carbon dioxide and water. It generally involves the green pigment chlorophyll and generates oxygen as a by-product.\n"
79
+ "Student Answer: Photosynthesis is how plants make their food using sunlight and carbon dioxide. It also gives off oxygen.\n"
80
+ "Mark Scheme: {'1':'Mentions use of sunlight', '2':'Mentions carbon dioxide and water', '3':'Mentions production of oxygen', '4':'Explains synthesis of nutrients or food', '5':'Mentions chlorophyll or green pigment'}"
81
+ )
82
+
83
+ user = [
84
+ {"role":"system", "content": "You are a grading assistant. Evaluate student answers based on the mark scheme. Respond only in JSON format with keys 'score' (int) and 'rationale' (string)."},
85
+ {"role":"user", "content": user_content},
86
+ ]
87
+
88
+ inputs = tokenizer.apply_chat_template(user, tokenize=True, add_generation_prompt=True, return_tensors="pt", return_dict=True).to(model.device)
89
+ generated_ids = model.generate(**inputs, max_new_tokens=128, temperature=0.2, top_k=5, do_sample=False)[0]
90
+ new_generated_ids = generated_ids[inputs["input_ids"].shape[1]:]
91
+ generated_text = tokenizer.decode(new_generated_ids, skip_special_tokens=True)
92
+
93
+ print(generated_text)
94
+ ```
95
+
96
+ Results:
97
+ ```
98
+ {"score": 5, "rationale": "Your answer is correct. You have accurately described the process of photosynthesis, mentioning the use of sunlight, carbon dioxide, and water, and the production of food and oxygen as by-products. Keep up the good work!"}
99
+ ```
100
+
101
+
102
+ ## Training hyperparameters
103
+
104
+ The following hyperparameters were used during training:
105
+ - per_device_train_batch_size:1
106
+ - per_device_eval_batch_size:1
107
+ - gradient_accumulation_steps:8
108
+ - eval_strategy:"steps"
109
+ - save_strategy:"steps"
110
+ - eval_steps:10
111
+ - logging_dir:"./logs"
112
+ - logging_steps:10
113
+ - save_total_limit:1
114
+ - learning_rate:1e-5
115
+ - warmup_steps:100
116
+ - weight_decay:0.01
117
+ - num_train_epochs:3
118
+ - load_best_model_at_end:True
119
+ - lr_scheduler_type:"cosine"
120
+ - metric_for_best_model:"eval_loss"
121
+ - greater_is_better:False
122
 
123
+ ## Training results
124
 
125
+ | Step | Training Loss | Validation Loss |
126
+ |------|----------------|------------------|
127
+ | 10 | 3.249500 | 3.304678 |
128
+ | 20 | 3.251500 | 3.268980 |
129
+ | 30 | 3.217000 | 3.200722 |
130
+ | 40 | 3.075900 | 3.100004 |
131
+ | 50 | 3.016900 | 2.967428 |
132
+ | 60 | 2.841200 | 2.804593 |
133
+ | 70 | 2.707600 | 2.622378 |
134
+ | 80 | 2.545500 | 2.426775 |
135
+ | 90 | 2.301500 | 2.212558 |
136
+ | 100 | 2.071500 | 1.997747 |
137
+ | 110 | 1.893100 | 1.798315 |
138
+ | 120 | 1.705800 | 1.592137 |
139
+ | 130 | 1.544700 | 1.481935 |
140
+ | 140 | 1.467700 | 1.415517 |
141
+ | 150 | 1.404500 | 1.377060 |
142
+ | 160 | 1.318000 | 1.357312 |
143
+ | 170 | 1.401800 | 1.349880 |
144
+ | 180 | 1.361200 | 1.347993 |
145
 
146
+ ## Framework versions
147
 
148
+ - Transformers 4.51.3
149
+ - Pytorch 2.7.0
150
+ - Datasets 3.6.0
151
+ - Unsloth 2025.5.6