helizac commited on
Commit
05f168b
·
verified ·
1 Parent(s): 7e1dae6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -13,7 +13,7 @@ pipeline_tag: text-classification
13
  # helizac/distilbert-pair-acceptability
14
 
15
  This model is a fine-tuned version of `dbmdz/distilbert-base-turkish-cased` for classifying the acceptability of a Turkish text output given a Turkish text input.
16
- It was developed as part of the thesis "Evaluation of the Acceptability of Model Outputs" (May 2025).
17
 
18
  ## Model Description
19
 
@@ -111,7 +111,7 @@ output_text_2 = "Elmalar çok güzel!"
111
  prediction_2, confidence_2 = predict_pair_acceptability(input_text_2, output_text_2, model, tokenizer, device, MAX_LENGTH)
112
  print(f"Input: {input_text_2}\nOutput: {output_text_2}\nPrediction: {prediction_2} (Confidence: {confidence_2:.4f})\n")
113
 
114
- # Example 3: Unacceptable (grammatically poor, from thesis Table 4.5)
115
  input_text_3 = "Hayalindeki meslek ne büyük."
116
  output_text_3 = "Olmak ben istemek büyük.
117
  prediction_3, confidence_3 = predict_pair_acceptability(input_text_3, output_text_3, model, tokenizer, device, MAX_LENGTH)
@@ -120,7 +120,7 @@ print(f"Input: {input_text_3}\nOutput: {output_text_3}\nPrediction: {prediction_
120
 
121
  ## Training Data
122
  The model was fine-tuned on a dataset of approximately 460,000 Turkish input-output text pairs.
123
- "Acceptable" pairs (\~132,000) were sourced from various public Turkish NLP datasets (details in the thesis).
124
  "Unacceptable" pairs (\~328,000) were synthetically generated by applying rule-based corruptions (typos, toxic word injection, repetition, mismatched outputs) to the acceptable outputs.
125
  All pairs were truncated/padded to a maximum sequence length of 64 tokens for the combined input and output.
126
 
@@ -134,7 +134,7 @@ The stress test for this model showed:
134
  * (Tested on T4 GPU)
135
 
136
  ## Citation
137
- This model was developed as part of the following thesis:
138
 
139
  Erdi, F. (2025). MODEL ÇIKTILARININ KABUL EDİLEBİLİRLİĞİNİN DEĞERLENDİRİLMESİ (Evaluation of the Acceptability of Model Outputs). T.C Galatasaray Üniversitesi, Mühendislik ve Teknoloji Fakültesi.
140
 
 
13
  # helizac/distilbert-pair-acceptability
14
 
15
  This model is a fine-tuned version of `dbmdz/distilbert-base-turkish-cased` for classifying the acceptability of a Turkish text output given a Turkish text input.
16
+ It was developed as part of the "Evaluation of the Acceptability of Model Outputs" (May 2025).
17
 
18
  ## Model Description
19
 
 
111
  prediction_2, confidence_2 = predict_pair_acceptability(input_text_2, output_text_2, model, tokenizer, device, MAX_LENGTH)
112
  print(f"Input: {input_text_2}\nOutput: {output_text_2}\nPrediction: {prediction_2} (Confidence: {confidence_2:.4f})\n")
113
 
114
+ # Example 3: Unacceptable (grammatically poor)
115
  input_text_3 = "Hayalindeki meslek ne büyük."
116
  output_text_3 = "Olmak ben istemek büyük.
117
  prediction_3, confidence_3 = predict_pair_acceptability(input_text_3, output_text_3, model, tokenizer, device, MAX_LENGTH)
 
120
 
121
  ## Training Data
122
  The model was fine-tuned on a dataset of approximately 460,000 Turkish input-output text pairs.
123
+ "Acceptable" pairs (\~132,000) were sourced from various public Turkish NLP datasets.
124
  "Unacceptable" pairs (\~328,000) were synthetically generated by applying rule-based corruptions (typos, toxic word injection, repetition, mismatched outputs) to the acceptable outputs.
125
  All pairs were truncated/padded to a maximum sequence length of 64 tokens for the combined input and output.
126
 
 
134
  * (Tested on T4 GPU)
135
 
136
  ## Citation
137
+ This model was developed as part of the following:
138
 
139
  Erdi, F. (2025). MODEL ÇIKTILARININ KABUL EDİLEBİLİRLİĞİNİN DEĞERLENDİRİLMESİ (Evaluation of the Acceptability of Model Outputs). T.C Galatasaray Üniversitesi, Mühendislik ve Teknoloji Fakültesi.
140