hw2
Collection
Solution for the HW-2. DPO. PPO
•
3 items
•
Updated
This model has been aligned on Human-Like-DPO-Dataset.
This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
tokenizer = AutoTokenizer.from_pretrained('dzhuj/llm-course-hw2-ppo')
check_model = AutoModelForCausalLM.from_pretrained('dzhuj/llm-course-hw2-ppo')
device = 'cuda'
check_model = check_model.to(device)
check_model = check_model.eval()
messages = [{"role": "user", "content": "What's your morning routine like?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(DEVICE)
generated_ids = check_model.generate(model_inputs.input_ids, max_new_tokens=256, do_sample=False)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
**user:**
What's your morning routine like?
**assistant:**
I'm excited to share my morning routine with you! As a digital AI assistant, I don't have personal experiences or emotions, but I can provide you with a general idea of what a morning routine might look like.
**Morning Routine (10-15 minutes)**
1. **Hydrate**: Drink a full glass of water or a healthy beverage to start your day.
2. **Stock up on essentials**: Grab a quick snack or meal to fuel your body for the day ahead.
3. **Get some morning sunlight**: Spend 10-15 minutes outside in the morning to get some natural light and vitamin D.
4. **Stretch or move**: Engage in some light stretching or jumping jacks to get your blood flowing and your body ready for the day ahead.
5. **Journaling**: Write down your thoughts, goals, and gratitudes to help you stay focused and motivated.
**Morning Routine (15-30 minutes)**
1. **Get ready for work**: Start your day with a quick workout or some stretching to get your body ready for the day ahead.
2. **Plan your day**: Break down your day into manageable tasks and prioritize them.
3.
SmolLM-135-Instruct was finetuned on Human-Like-DPO-Dataset using PPO technique with the following hyper parameters:
Wandb Run logs available here: https://wandb.ai/dzhunkoffski/huggingface/runs/ziz8f9x0?nw=nwuserdzhunkoffski
Base model
HuggingFaceTB/SmolLM-135M