Update README.md
Browse files
README.md
CHANGED
@@ -174,7 +174,7 @@ Before training, the texts in this dataset were chunked using the `meta-llama/Me
|
|
174 |
* **Technique:** QLoRA (4-bit NormalFloat Quantization + Low-Rank Adaptation) using the PEFT library.
|
175 |
* **Libraries:** `transformers`, `peft`, `accelerate`, `bitsandbytes`, `datasets`.
|
176 |
|
177 |
-
#### Preprocessing
|
178 |
|
179 |
Cleaning steps mentioned above (whitespace, header/footer removal etc.) and tokenizer-based chunking were applied. `DataCollatorForLanguageModeling` was used during training.
|
180 |
|
@@ -197,7 +197,7 @@ Cleaning steps mentioned above (whitespace, header/footer removal etc.) and toke
|
|
197 |
* **precision:** bf16 (mixed precision)
|
198 |
* **gradient_checkpointing:** True
|
199 |
|
200 |
-
#### Speeds, Sizes, Times
|
201 |
|
202 |
* Training was performed on a single GPU in Kaggle's free tier (likely T4 or P100 - exact type not logged).
|
203 |
* The 200-step training run took approximately **8.5 hours**. Flash Attention 2 could not be used.
|
@@ -230,10 +230,6 @@ N/A
|
|
230 |
|
231 |
The short 200-step training demonstrated that the fine-tuning pipeline works, but was insufficient for significant domain adaptation. A slight decrease in training loss was observed.
|
232 |
|
233 |
-
## Model Examination [optional]
|
234 |
-
|
235 |
-
[More Information Needed]
|
236 |
-
|
237 |
## Environmental Impact
|
238 |
|
239 |
* **Hardware Type:** Kaggle GPU (Likely T4 or P100 tier)
|
@@ -242,7 +238,7 @@ The short 200-step training demonstrated that the fine-tuning pipeline works, bu
|
|
242 |
* **Compute Region:** Unknown (Managed by Kaggle)
|
243 |
* **Carbon Emitted:** Can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute), but estimating accurately requires specific GPU power consumption data, which is difficult to obtain for Kaggle free tiers.
|
244 |
|
245 |
-
## Technical Specifications
|
246 |
|
247 |
### Model Architecture and Objective
|
248 |
|
|
|
174 |
* **Technique:** QLoRA (4-bit NormalFloat Quantization + Low-Rank Adaptation) using the PEFT library.
|
175 |
* **Libraries:** `transformers`, `peft`, `accelerate`, `bitsandbytes`, `datasets`.
|
176 |
|
177 |
+
#### Preprocessing
|
178 |
|
179 |
Cleaning steps mentioned above (whitespace, header/footer removal etc.) and tokenizer-based chunking were applied. `DataCollatorForLanguageModeling` was used during training.
|
180 |
|
|
|
197 |
* **precision:** bf16 (mixed precision)
|
198 |
* **gradient_checkpointing:** True
|
199 |
|
200 |
+
#### Speeds, Sizes, Times
|
201 |
|
202 |
* Training was performed on a single GPU in Kaggle's free tier (likely T4 or P100 - exact type not logged).
|
203 |
* The 200-step training run took approximately **8.5 hours**. Flash Attention 2 could not be used.
|
|
|
230 |
|
231 |
The short 200-step training demonstrated that the fine-tuning pipeline works, but was insufficient for significant domain adaptation. A slight decrease in training loss was observed.
|
232 |
|
|
|
|
|
|
|
|
|
233 |
## Environmental Impact
|
234 |
|
235 |
* **Hardware Type:** Kaggle GPU (Likely T4 or P100 tier)
|
|
|
238 |
* **Compute Region:** Unknown (Managed by Kaggle)
|
239 |
* **Carbon Emitted:** Can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute), but estimating accurately requires specific GPU power consumption data, which is difficult to obtain for Kaggle free tiers.
|
240 |
|
241 |
+
## Technical Specifications
|
242 |
|
243 |
### Model Architecture and Objective
|
244 |
|