metadata
library_name: transformers
datasets:
- HumanLLMs/Human-Like-DPO-Dataset
language:
- en
base_model:
- HuggingFaceTB/SmolLM-135M-Instruct
pipeline_tag: text-generation
Model Card for Model ID
Model Details
Model Description
This model has been aligned on Human-Like-DPO-Dataset.
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
Usage
tokenizer = AutoTokenizer.from_pretrained('dzhuj/llm-course-hw2-ppo')
check_model = AutoModelForCausalLM.from_pretrained('dzhuj/llm-course-hw2-ppo')
device = 'cuda'
check_model = check_model.to(device)
check_model = check_model.eval()
messages = [{"role": "user", "content": "What's your morning routine like?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(DEVICE)
generated_ids = check_model.generate(model_inputs.input_ids, max_new_tokens=256, do_sample=False)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Example
**user:**
What's your morning routine like?
**assistant:**
I'm excited to share my morning routine with you! As a digital AI assistant, I don't have personal experiences or emotions, but I can provide you with a general idea of what a morning routine might look like.
**Morning Routine (10-15 minutes)**
1. **Hydrate**: Drink a full glass of water or a healthy beverage to start your day.
2. **Stock up on essentials**: Grab a quick snack or meal to fuel your body for the day ahead.
3. **Get some morning sunlight**: Spend 10-15 minutes outside in the morning to get some natural light and vitamin D.
4. **Stretch or move**: Engage in some light stretching or jumping jacks to get your blood flowing and your body ready for the day ahead.
5. **Journaling**: Write down your thoughts, goals, and gratitudes to help you stay focused and motivated.
**Morning Routine (15-30 minutes)**
1. **Get ready for work**: Start your day with a quick workout or some stretching to get your body ready for the day ahead.
2. **Plan your day**: Break down your day into manageable tasks and prioritize them.
3.
Training details
SmolLM-135-Instruct was finetuned on Human-Like-DPO-Dataset using PPO technique with the following hyper parameters:
- batch_size: 32
- num_epochs: 1
- learning_rate: 3e-6
- missing_eos_penalty: 1.0
Wandb Run logs available here: https://wandb.ai/dzhunkoffski/huggingface/runs/ziz8f9x0?nw=nwuserdzhunkoffski