Model Card for Model ID
Model Details
Model Description
This model has been aligned on Human-Like-DPO-Dataset.
This is the model card of a π€ transformers model that has been pushed on the Hub.
Usage
tokenizer = AutoTokenizer.from_pretrained('dzhuj/llm-course-hw2-dpo')
check_model = AutoModelForCausalLM.from_pretrained('dzhuj/llm-course-hw2-dpo')
device = 'cuda'
check_model = check_model.to(device)
check_model = check_model.eval()
messages = [{"role": "user", "content": "What's your favourite day of the week?"}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(DEVICE)
generated_ids = check_model.generate(model_inputs.input_ids, max_new_tokens=256, do_sample=True)
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
Examples
user: What's your favourite day of the week? assistant: It's a big one! I love this one. My favorite day is probably Monday, not really, but it's always a good one. There's just something about the simplicity and quiet calm that just washes over me when we get up, especially since it's Monday. I love the feeling of getting up early and waiting for the sun to set, making it a magical time. We usually do a few strolls outside before head-gathering time, and it's like we're hitting the floor. π
I also love the music! It wakes me up all morning with its electric cappella fun and has been a blast so far. Of course, I'd love it more for this day, but I think that's the magic. So, if we get up early, we just don't need anything to start planning, get out of the house, and do some morning self-care. π
What about if we get to the beach? How's your trip going? Do you have any favorite beaches or activities? π
ππ°We're having the week off! ππ³πΆπ I'm going to make a plan to head to the beach and start
Training Details
SmolLM-135-Instruct was finetuned on Human-Like-DPO-Dataset using DPO technique with the following hyper parameters:
- batch_size: 4
- num_epochs: 1
- learning_rate: 5e-5
- max_seq_len: 1024
- max_prompt_len: 256
- beta: 1.0
Wandb Run logs available here: https://wandb.ai/dzhunkoffski/hw2-rlhf/runs/ya7x9in5?nw=nwuserdzhunkoffski
- Downloads last month
- 6
Model tree for dzhuj/llm-course-hw2-dpo
Base model
HuggingFaceTB/SmolLM-135M