ogulcanakca commited on
Commit
4bb4a4a
·
verified ·
1 Parent(s): 70bd2ae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -7
README.md CHANGED
@@ -174,7 +174,7 @@ Before training, the texts in this dataset were chunked using the `meta-llama/Me
174
  * **Technique:** QLoRA (4-bit NormalFloat Quantization + Low-Rank Adaptation) using the PEFT library.
175
  * **Libraries:** `transformers`, `peft`, `accelerate`, `bitsandbytes`, `datasets`.
176
 
177
- #### Preprocessing [optional]
178
 
179
  Cleaning steps mentioned above (whitespace, header/footer removal etc.) and tokenizer-based chunking were applied. `DataCollatorForLanguageModeling` was used during training.
180
 
@@ -197,7 +197,7 @@ Cleaning steps mentioned above (whitespace, header/footer removal etc.) and toke
197
  * **precision:** bf16 (mixed precision)
198
  * **gradient_checkpointing:** True
199
 
200
- #### Speeds, Sizes, Times [optional]
201
 
202
  * Training was performed on a single GPU in Kaggle's free tier (likely T4 or P100 - exact type not logged).
203
  * The 200-step training run took approximately **8.5 hours**. Flash Attention 2 could not be used.
@@ -230,10 +230,6 @@ N/A
230
 
231
  The short 200-step training demonstrated that the fine-tuning pipeline works, but was insufficient for significant domain adaptation. A slight decrease in training loss was observed.
232
 
233
- ## Model Examination [optional]
234
-
235
- [More Information Needed]
236
-
237
  ## Environmental Impact
238
 
239
  * **Hardware Type:** Kaggle GPU (Likely T4 or P100 tier)
@@ -242,7 +238,7 @@ The short 200-step training demonstrated that the fine-tuning pipeline works, bu
242
  * **Compute Region:** Unknown (Managed by Kaggle)
243
  * **Carbon Emitted:** Can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute), but estimating accurately requires specific GPU power consumption data, which is difficult to obtain for Kaggle free tiers.
244
 
245
- ## Technical Specifications [optional]
246
 
247
  ### Model Architecture and Objective
248
 
 
174
  * **Technique:** QLoRA (4-bit NormalFloat Quantization + Low-Rank Adaptation) using the PEFT library.
175
  * **Libraries:** `transformers`, `peft`, `accelerate`, `bitsandbytes`, `datasets`.
176
 
177
+ #### Preprocessing
178
 
179
  Cleaning steps mentioned above (whitespace, header/footer removal etc.) and tokenizer-based chunking were applied. `DataCollatorForLanguageModeling` was used during training.
180
 
 
197
  * **precision:** bf16 (mixed precision)
198
  * **gradient_checkpointing:** True
199
 
200
+ #### Speeds, Sizes, Times
201
 
202
  * Training was performed on a single GPU in Kaggle's free tier (likely T4 or P100 - exact type not logged).
203
  * The 200-step training run took approximately **8.5 hours**. Flash Attention 2 could not be used.
 
230
 
231
  The short 200-step training demonstrated that the fine-tuning pipeline works, but was insufficient for significant domain adaptation. A slight decrease in training loss was observed.
232
 
 
 
 
 
233
  ## Environmental Impact
234
 
235
  * **Hardware Type:** Kaggle GPU (Likely T4 or P100 tier)
 
238
  * **Compute Region:** Unknown (Managed by Kaggle)
239
  * **Carbon Emitted:** Can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute), but estimating accurately requires specific GPU power consumption data, which is difficult to obtain for Kaggle free tiers.
240
 
241
+ ## Technical Specifications
242
 
243
  ### Model Architecture and Objective
244