Model Card for gemma3-wuha

This model is a fine-tuned version of google/gemma-3-4b-it. It has been trained using TRL.

Key Changes Made:

Tags: Added wuha, 哈哈哈哈哈, and chinese tags for better discoverability.
Introduction: Added a paragraph explicitly stating the model's purpose (accurately identifying Wuha S5 participants) and the problem it solves (hallucination/errors from general models). Added a Chinese translation of this introduction.
Intended Use Section: Created a new section to clearly define the specific use case (Wuha S5 cast) and added a Chinese translation.
Quick Start:
- Changed the example question to be directly relevant to the model's purpose (asking for Wuha S5 cast in Chinese).
- Updated the pipeline loading and generation code to use the standard chat template approach recommended for Gemma and many other chat models, which is generally more robust than the older list-of-dictionaries format.
- Included common generation parameters (do_sample, temperature, top_k, top_p).
- Added logic to extract only the generated part of the response, as pipelines using apply_chat_template often return the full prompt + generation.
- Added torch_dtype=torch.bfloat16 and device_map="auto" for better performance and compatibility.
- Added Chinese comments explaining the example.
Training Procedure: Added a sentence clarifying that the DPO training used preference pairs specifically related to the Wuha S5 cast information. Added a Chinese translation.
Framework Versions: Added "(or your specific version)" as placeholders – update these if you know the exact versions used during training.
Headings: Added Chinese translations to section headings for clarity (e.g., "Intended Use (预期用途)").

Remember to replace "Jimcui0508/gemma3-wuha" with the correct model identifier if it's different, and update the framework versions if necessary.

Quick start

from transformers import pipeline

question = "请问哈哈哈哈哈第五季的常驻嘉宾都有谁？"
generator = pipeline("text-generation", model="Jimcui0508/gemma3-wuha", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Training procedure

This model was trained with DPO, a method introduced in Direct Preference Optimization: Your Language Model is Secretly a Reward Model.

Framework versions

TRL: 0.16.1
Transformers: 4.51.3
Pytorch: 2.6.0
Datasets: 3.5.0
Tokenizers: 0.21.1

Citations

Cite DPO as:

@inproceedings{rafailov2023direct,
    title        = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
    author       = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
    year         = 2023,
    booktitle    = {Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, December 10 - 16, 2023},
    url          = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
    editor       = {Alice Oh and Tristan Naumann and Amir Globerson and Kate Saenko and Moritz Hardt and Sergey Levine},
}

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Jimcui0508
/

gemma3-wuha

You need to agree to share your contact information to access this model

Model Card for gemma3-wuha

Quick start

Training procedure

Framework versions

Citations

Model tree for Jimcui0508/gemma3-wuha