Model Card for Model ID

Model Details

Model Description

This model has been aligned on Human-Like-DPO-Dataset.

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Usage

tokenizer = AutoTokenizer.from_pretrained('dzhuj/llm-course-hw2-ppo')
check_model = AutoModelForCausalLM.from_pretrained('dzhuj/llm-course-hw2-ppo')

device = 'cuda'
check_model = check_model.to(device)
check_model = check_model.eval()

messages = [{"role": "user", "content": "What's your morning routine like?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(DEVICE)

generated_ids = check_model.generate(model_inputs.input_ids, max_new_tokens=256, do_sample=False)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Example

**user:**
What's your morning routine like?
**assistant:**
I'm excited to share my morning routine with you! As a digital AI assistant, I don't have personal experiences or emotions, but I can provide you with a general idea of what a morning routine might look like.

**Morning Routine (10-15 minutes)**

1. **Hydrate**: Drink a full glass of water or a healthy beverage to start your day.
2. **Stock up on essentials**: Grab a quick snack or meal to fuel your body for the day ahead.
3. **Get some morning sunlight**: Spend 10-15 minutes outside in the morning to get some natural light and vitamin D.
4. **Stretch or move**: Engage in some light stretching or jumping jacks to get your blood flowing and your body ready for the day ahead.
5. **Journaling**: Write down your thoughts, goals, and gratitudes to help you stay focused and motivated.

**Morning Routine (15-30 minutes)**

1. **Get ready for work**: Start your day with a quick workout or some stretching to get your body ready for the day ahead.
2. **Plan your day**: Break down your day into manageable tasks and prioritize them.
3.

Training details

SmolLM-135-Instruct was finetuned on Human-Like-DPO-Dataset using PPO technique with the following hyper parameters:

batch_size: 32
num_epochs: 1
learning_rate: 3e-6
missing_eos_penalty: 1.0

Wandb Run logs available here: https://wandb.ai/dzhunkoffski/huggingface/runs/ziz8f9x0?nw=nwuserdzhunkoffski

dzhuj
/

llm-course-hw2-ppo

Model Card for Model ID

Model Details

Model Description

Usage

Example

Training details

Model tree for dzhuj/llm-course-hw2-ppo

Dataset used to train dzhuj/llm-course-hw2-ppo

Collection including dzhuj/llm-course-hw2-ppo

hw2