|
--- |
|
tags: |
|
- summarization |
|
- text-generation |
|
- gemma3 |
|
|
|
base_model: |
|
- google/gemma-3-270m-it |
|
|
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
license: gemma |
|
datasets: |
|
- EdinburghNLP/xsum |
|
language: |
|
- en |
|
--- |
|
|
|
|
|
# Gemma-3 270M Fine-tuned (XSum) |
|
|
|
This is a fine-tuned version of |
|
**[google/gemma-3-270m](https://huggingface.co/google/gemma-3-270m)** |
|
using the **XSum** dataset.\ |
|
The model was trained with **Unsloth** for efficient fine-tuning and the |
|
**LoRA adapters have been merged into the model weights**. |
|
|
|
------------------------------------------------------------------------ |
|
|
|
## Model Details |
|
|
|
- **Base model:** `google/gemma-3-270m` |
|
- **Architecture:** Gemma-3, 270M parameters |
|
- **Training framework:** |
|
[Unsloth](https://github.com/unslothai/unsloth) |
|
- **Task:** Abstractive summarization |
|
- **Dataset:** |
|
[XSum](https://huggingface.co/datasets/EdinburghNLP/xsum) |
|
- **Adapter merge:** Yes (LoRA weights merged into final model) |
|
- **Precision:** Full precision (no 4bit/8bit quantization used) |
|
|
|
------------------------------------------------------------------------ |
|
|
|
|
|
## Training Configuration |
|
|
|
The model was fine-tuned starting from **`unsloth/gemma-3-270m-it`** |
|
using **LoRA** adapters with the **Unsloth framework**.\ |
|
The LoRA adapters were later merged into the base model weights. |
|
|
|
- **Base model:** `unsloth/gemma-3-270m-it`\ |
|
- **Sequence length:** 2048 \ |
|
- **Quantization:** not used (no 4-bit or 8-bit)\ |
|
- **Full finetuning:** disabled (LoRA fine-tuning only) |
|
|
|
### LoRA Setup |
|
|
|
- **Rank (r):** 128\ |
|
- **Target modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`, |
|
`gate_proj`, `up_proj`, `down_proj`\ |
|
- **LoRA alpha:** 128\ |
|
- **LoRA dropout:** 0\ |
|
|
|
### Training Details |
|
|
|
- **Dataset:** |
|
[XSum](https://huggingface.co/datasets/EdinburghNLP/xsum)\ |
|
- **Batch size per device:** 128\ |
|
- **Gradient accumulation steps:** 1\ |
|
- **Warmup steps:** 5\ |
|
- **Training epochs:** 1 \ |
|
- **Learning rate:** 5e-5 (linear schedule)\ |
|
|
|
------------------------------------------------------------------------ |
|
|
|
## Intended Use |
|
|
|
- **Primary use case:** Abstractive summarization of long-form text |
|
(news-style) |
|
- **Not suitable for:** Factual Q&A, reasoning, coding, or tasks |
|
requiring large-context models |
|
- **Limitations:** Small model size (270M) means limited reasoning |
|
ability compared to larger Gemma models |
|
|
|
------------------------------------------------------------------------ |
|
|
|
## Example Usage |
|
|
|
``` python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "ShahzebKhoso/Gemma3_270M_FineTuned_XSUM" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
|
text = "The UK government announced new measures to support renewable energy." |
|
inputs = tokenizer(text, return_tensors="pt") |
|
|
|
outputs = model.generate(**inputs, max_new_tokens=100) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
------------------------------------------------------------------------ |
|
|
|
## License |
|
|
|
- Base model license: [Gemma |
|
License](https://ai.google.dev/gemma/terms) |
|
- Dataset license: [XSum (CC BY-NC-SA |
|
4.0)](https://huggingface.co/datasets/EdinburghNLP/xsum) |
|
|
|
------------------------------------------------------------------------ |
|
|
|
## Acknowledgements |
|
|
|
- [Unsloth](https://github.com/unslothai/unsloth) for efficient |
|
finetuning |
|
- [Google DeepMind](https://deepmind.google/) for Gemma-3 |
|
- [EdinburghNLP](https://huggingface.co/EdinburghNLP) for XSum dataset |
|
|