---
base_model: google/gemma-3-4b-it
library_name: transformers
model_name: output
tags:
- generated_from_trainer
- trl
- sft
licence: license
license: mit
language:
- vi
pipeline_tag: text-generation
---

# Introduction

This model is a fine-tuned version of [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) optimized for query rewriting based on conversation history to enhance conversational retrieval and question answering.
It has been trained using [TRL](https://github.com/huggingface/trl).

## Quick start

```python
from transformers import AutoTokenizer, Gemma3ForConditionalGeneration


MODEL_PATH = "r1ck/gemma-3-4b-it-rw"

model = Gemma3ForConditionalGeneration.from_pretrained(
    MODEL_PATH, device_map="auto", attn_implementation='eager'
).eval()

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)

prompt_template = """Given a conversation between (user, assistant) and a follow-up message from user, your task is to rewrite the follow-up message to a standalone message that captures all relevant context from the conversation.
Consume the entire conversation and follow-up message and think deeply about it. If the follow-up message is already clear, don't need to rewrite it, just return the original message. The rewritten message must be in Vietnamese.

# Conversation:
user: Hà Nội có những đặc điểm văn hóa nào nổi bật từ lịch sử?
assistant: Hà Nội, với lịch sử là kinh đô của Việt Nam, đã hội tụ nhiều tinh hoa văn hóa từ miền Bắc và cả nước. Thành phố này là nơi quy tụ của những nhân vật ưu tú, thương nhân, nghệ nhân, và thợ thủ công lành nghề từ khắp nơi. Họ mang theo phong tục, tập quán địa phương của mình, từ đó tạo nên nét văn hóa đặc trưng cho Hà Nội.

# Follow-up message:
Điều này mang lại lợi ích gì về du lịch?

# Rewritten message:
"""

inputs = tokenizer(
    [prompt_template],
    return_tensors="pt"
).to("cuda")

outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=2048,
    eos_token_id=tokenizer.eos_token_id,
    use_cache=True,
)
response = tokenizer.batch_decode(outputs, skip_special_tokens=True)
print(response[0])
```

## Training procedure

 
This model was trained with SFT.

### Framework versions

- TRL: 0.16.1
- Transformers: 4.51.3
- Pytorch: 2.6.0
- Datasets: 3.5.0
- Tokenizers: 0.21.1

## Citations


Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```