File size: 2,282 Bytes
f3b8514 10513cc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
---
license: apache-2.0
datasets:
- HuggingFaceH4/ultrachat_200k
base_model:
- HuggingFaceTB/SmolLM2-1.7B
library_name:
- peft
---
# SmolLM2-1.7B-ultrachat_200k
Quantized Low Rank Adaptation (QLoRA) finetuned from HuggingFaceTB/SmolLM2-1.7B to UltraChat 200k dataset.
Model trained as an exercise in LLM post-training.
## Model Details
- **Developed by:** Andrew Melbourne
- **Model type:** Language Model
- **License:** Apache 2.0
- **Finetuned from model:** HuggingFaceTB/SmolLM2-1.7B
### Model Sources
Training and inference scripts are available here.
- **Repository:** [SmolLM2-1.7B-ultrachat_200k on Github](https://github.com/Melbourneandrew/SmolLM2-1.7B-Ultrachat_200k?tab=readme-ov-file)
## How to Get Started with the Model
Use the code below to get started with the model.
```python
from peft import LoraConfig, get_peft_model, TaskType
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("M3LBY/SmolLM2-1.7B-ultrachat_200k")
tokenizer = AutoTokenizer.from_pretrained("M3LBY/SmolLM2-1.7B-ultrachat_200k")
messages = [{"role": "user", "content": "How far away is the sun?"}]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)`
print(response)
```
## Training Details
The adapter model was trained using Supervised Fine-Tuning (SFT) with the following configuration:
- Base model: SmolLM2-1.7B
- Mixed precision: bfloat16
- Learning rate: 2e-5 with linear scheduler
- Warmup ratio: 0.1
- Training epochs: 1
- Effective batch size: 32
- Sequence length: 512 tokens
- Flash Attention 2 enabled
Trained to a loss of 1.6965 after 6,496 steps.
Elapsed time: 2 hours 37 minutes.
Consumed ~22 Colab Compute Units for an estimated cost of $2.21 cents.
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
## Citation [optional]
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
- PEFT 0.14.0% |