A humor-focused language model trained on prompts and completions scraped from a subreddit known for its comedic content. The model undergoes Supervised Fine-Tuning (SFT) and Parameter-Efficient Fine-Tuning (PEFT) using LoRA to optimize its parameters efficiently. Following these steps, the model is further refined using Direct Preference Optimization (DPO), which aligns it with human preferences by leveraging chosen and rejected responses from the dataset. This multi-stage training pipeline ensures the model generates contextually appropriate and humorous outputs while maintaining computational efficiency.
The SFT-trained version can be found here: Humorous_SFT_LLama2_7b.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for ALEXIOSTER/Humorous_DPO_LLama2_7b
Base model
meta-llama/Llama-2-7b-hf