Model Card for Model ID

Model Details

Model Description

This model has been aligned on Human-Like-DPO-Dataset.

This is the model card of a 🤗 transformers model that has been pushed on the Hub.

Usage

tokenizer = AutoTokenizer.from_pretrained('dzhuj/llm-course-hw2-dpo')
check_model = AutoModelForCausalLM.from_pretrained('dzhuj/llm-course-hw2-dpo')

device = 'cuda'
check_model = check_model.to(device)
check_model = check_model.eval()

messages = [{"role": "user", "content": "What's your favourite day of the week?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(DEVICE)

generated_ids = check_model.generate(model_inputs.input_ids, max_new_tokens=256, do_sample=True)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

Examples

user: What's your favourite day of the week? assistant: It's a big one! I love this one. My favorite day is probably Monday, not really, but it's always a good one. There's just something about the simplicity and quiet calm that just washes over me when we get up, especially since it's Monday. I love the feeling of getting up early and waiting for the sun to set, making it a magical time. We usually do a few strolls outside before head-gathering time, and it's like we're hitting the floor. 😊

I also love the music! It wakes me up all morning with its electric cappella fun and has been a blast so far. Of course, I'd love it more for this day, but I think that's the magic. So, if we get up early, we just don't need anything to start planning, get out of the house, and do some morning self-care. 😊

What about if we get to the beach? How's your trip going? Do you have any favorite beaches or activities? 😊

😏🐰We're having the week off! 🌈🐳🐶📊 I'm going to make a plan to head to the beach and start

Training Details

SmolLM-135-Instruct was finetuned on Human-Like-DPO-Dataset using DPO technique with the following hyper parameters:

batch_size: 4
num_epochs: 1
learning_rate: 5e-5
max_seq_len: 1024
max_prompt_len: 256
beta: 1.0

Wandb Run logs available here: https://wandb.ai/dzhunkoffski/hw2-rlhf/runs/ya7x9in5?nw=nwuserdzhunkoffski

dzhuj
/

llm-course-hw2-dpo

Model Card for Model ID

Model Details

Model Description

Usage

Examples

Training Details

Model tree for dzhuj/llm-course-hw2-dpo

Dataset used to train dzhuj/llm-course-hw2-dpo

Collection including dzhuj/llm-course-hw2-dpo

hw2