ShahzebKhoso
/

Gemma3_270M_FineTuned_XSUM

Text Generation

text-generation-inference

Model card Files Files and versions

Gemma3_270M_FineTuned_XSUM / README.md

ShahzebKhoso's picture

Update README.md

706695d verified 9 days ago

|

history blame contribute delete

3.54 kB

	---
	tags:
	- summarization
	- text-generation
	- gemma3

	base_model:
	- google/gemma-3-270m-it

	library_name: transformers
	pipeline_tag: text-generation
	license: gemma
	datasets:
	- EdinburghNLP/xsum
	language:
	- en
	---


	# Gemma-3 270M Fine-tuned (XSum)

	This is a fine-tuned version of
	[google/gemma-3-270m](https://huggingface.co/google/gemma-3-270m)
	using the XSum dataset.\
	The model was trained with Unsloth for efficient fine-tuning and the
	LoRA adapters have been merged into the model weights.

	------------------------------------------------------------------------

	## Model Details

	- Base model: `google/gemma-3-270m`
	- Architecture: Gemma-3, 270M parameters
	- Training framework:
	[Unsloth](https://github.com/unslothai/unsloth)
	- Task: Abstractive summarization
	- Dataset:
	[XSum](https://huggingface.co/datasets/EdinburghNLP/xsum)
	- Adapter merge: Yes (LoRA weights merged into final model)
	- Precision: Full precision (no 4bit/8bit quantization used)

	------------------------------------------------------------------------


	## Training Configuration

	The model was fine-tuned starting from `unsloth/gemma-3-270m-it`
	using LoRA adapters with the Unsloth framework.\
	The LoRA adapters were later merged into the base model weights.

	- Base model: `unsloth/gemma-3-270m-it`\
	- Sequence length: 2048 \
	- Quantization: not used (no 4-bit or 8-bit)\
	- Full finetuning: disabled (LoRA fine-tuning only)

	### LoRA Setup

	- Rank (r): 128\
	- Target modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`,
	`gate_proj`, `up_proj`, `down_proj`\
	- LoRA alpha: 128\
	- LoRA dropout: 0\

	### Training Details

	- Dataset:
	[XSum](https://huggingface.co/datasets/EdinburghNLP/xsum)\
	- Batch size per device: 128\
	- Gradient accumulation steps: 1\
	- Warmup steps: 5\
	- Training epochs: 1 \
	- Learning rate: 5e-5 (linear schedule)\

	------------------------------------------------------------------------

	## Intended Use

	- Primary use case: Abstractive summarization of long-form text
	(news-style)
	- Not suitable for: Factual Q&A, reasoning, coding, or tasks
	requiring large-context models
	- Limitations: Small model size (270M) means limited reasoning
	ability compared to larger Gemma models

	------------------------------------------------------------------------

	## Example Usage

	``` python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_name = "ShahzebKhoso/Gemma3_270M_FineTuned_XSUM"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForCausalLM.from_pretrained(model_name)

	text = "The UK government announced new measures to support renewable energy."
	inputs = tokenizer(text, return_tensors="pt")

	outputs = model.generate(**inputs, max_new_tokens=100)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	------------------------------------------------------------------------

	## License

	- Base model license: [Gemma
	License](https://ai.google.dev/gemma/terms)
	- Dataset license: [XSum (CC BY-NC-SA
	4.0)](https://huggingface.co/datasets/EdinburghNLP/xsum)

	------------------------------------------------------------------------

	## Acknowledgements

	- [Unsloth](https://github.com/unslothai/unsloth) for efficient
	finetuning
	- [Google DeepMind](https://deepmind.google/) for Gemma-3
	- [EdinburghNLP](https://huggingface.co/EdinburghNLP) for XSum dataset