Model Card for gemma3-wuha
This model is a fine-tuned version of google/gemma-3-4b-it. It has been trained using TRL.
Key Changes Made:
- Tags: Added
wuha
,哈哈哈哈哈
, andchinese
tags for better discoverability. - Introduction: Added a paragraph explicitly stating the model's purpose (accurately identifying Wuha S5 participants) and the problem it solves (hallucination/errors from general models). Added a Chinese translation of this introduction.
- Intended Use Section: Created a new section to clearly define the specific use case (Wuha S5 cast) and added a Chinese translation.
- Quick Start:
- Changed the example
question
to be directly relevant to the model's purpose (asking for Wuha S5 cast in Chinese). - Updated the pipeline loading and generation code to use the standard chat template approach recommended for Gemma and many other chat models, which is generally more robust than the older list-of-dictionaries format.
- Included common generation parameters (
do_sample
,temperature
,top_k
,top_p
). - Added logic to extract only the generated part of the response, as pipelines using
apply_chat_template
often return the full prompt + generation. - Added
torch_dtype=torch.bfloat16
anddevice_map="auto"
for better performance and compatibility. - Added Chinese comments explaining the example.
- Changed the example
- Training Procedure: Added a sentence clarifying that the DPO training used preference pairs specifically related to the Wuha S5 cast information. Added a Chinese translation.
- Framework Versions: Added "(or your specific version)" as placeholders – update these if you know the exact versions used during training.
- Headings: Added Chinese translations to section headings for clarity (e.g., "Intended Use (预期用途)").
Remember to replace "Jimcui0508/gemma3-wuha"
with the correct model identifier if it's different, and update the framework versions if necessary.
Quick start
from transformers import pipeline
question = "请问哈哈哈哈哈第五季的常驻嘉宾都有谁?"
generator = pipeline("text-generation", model="Jimcui0508/gemma3-wuha", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
Training procedure
This model was trained with DPO, a method introduced in Direct Preference Optimization: Your Language Model is Secretly a Reward Model.
Framework versions
- TRL: 0.16.1
- Transformers: 4.51.3
- Pytorch: 2.6.0
- Datasets: 3.5.0
- Tokenizers: 0.21.1
Citations
Cite DPO as:
@inproceedings{rafailov2023direct,
title = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
author = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
year = 2023,
booktitle = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
url = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
editor = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
}
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support