ShahzebKhoso's picture
Update README.md
706695d verified
---
tags:
- summarization
- text-generation
- gemma3
base_model:
- google/gemma-3-270m-it
library_name: transformers
pipeline_tag: text-generation
license: gemma
datasets:
- EdinburghNLP/xsum
language:
- en
---
# Gemma-3 270M Fine-tuned (XSum)
This is a fine-tuned version of
**[google/gemma-3-270m](https://huggingface.co/google/gemma-3-270m)**
using the **XSum** dataset.\
The model was trained with **Unsloth** for efficient fine-tuning and the
**LoRA adapters have been merged into the model weights**.
------------------------------------------------------------------------
## Model Details
- **Base model:** `google/gemma-3-270m`
- **Architecture:** Gemma-3, 270M parameters
- **Training framework:**
[Unsloth](https://github.com/unslothai/unsloth)
- **Task:** Abstractive summarization
- **Dataset:**
[XSum](https://huggingface.co/datasets/EdinburghNLP/xsum)
- **Adapter merge:** Yes (LoRA weights merged into final model)
- **Precision:** Full precision (no 4bit/8bit quantization used)
------------------------------------------------------------------------
## Training Configuration
The model was fine-tuned starting from **`unsloth/gemma-3-270m-it`**
using **LoRA** adapters with the **Unsloth framework**.\
The LoRA adapters were later merged into the base model weights.
- **Base model:** `unsloth/gemma-3-270m-it`\
- **Sequence length:** 2048 \
- **Quantization:** not used (no 4-bit or 8-bit)\
- **Full finetuning:** disabled (LoRA fine-tuning only)
### LoRA Setup
- **Rank (r):** 128\
- **Target modules:** `q_proj`, `k_proj`, `v_proj`, `o_proj`,
`gate_proj`, `up_proj`, `down_proj`\
- **LoRA alpha:** 128\
- **LoRA dropout:** 0\
### Training Details
- **Dataset:**
[XSum](https://huggingface.co/datasets/EdinburghNLP/xsum)\
- **Batch size per device:** 128\
- **Gradient accumulation steps:** 1\
- **Warmup steps:** 5\
- **Training epochs:** 1 \
- **Learning rate:** 5e-5 (linear schedule)\
------------------------------------------------------------------------
## Intended Use
- **Primary use case:** Abstractive summarization of long-form text
(news-style)
- **Not suitable for:** Factual Q&A, reasoning, coding, or tasks
requiring large-context models
- **Limitations:** Small model size (270M) means limited reasoning
ability compared to larger Gemma models
------------------------------------------------------------------------
## Example Usage
``` python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "ShahzebKhoso/Gemma3_270M_FineTuned_XSUM"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
text = "The UK government announced new measures to support renewable energy."
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
------------------------------------------------------------------------
## License
- Base model license: [Gemma
License](https://ai.google.dev/gemma/terms)
- Dataset license: [XSum (CC BY-NC-SA
4.0)](https://huggingface.co/datasets/EdinburghNLP/xsum)
------------------------------------------------------------------------
## Acknowledgements
- [Unsloth](https://github.com/unslothai/unsloth) for efficient
finetuning
- [Google DeepMind](https://deepmind.google/) for Gemma-3
- [EdinburghNLP](https://huggingface.co/EdinburghNLP) for XSum dataset