mt5-small-Context-Based-Chat-Summary-Plus

This model is a fine-tuned version of google/mt5-small on the prithivMLmods/Context-Based-Chat-Summary-Plus dataset. It performs well on context-based summarization tasks, leveraging the mT5 model's multilingual capabilities.

Model description

This model is designed for summarizing context-based chat data. The model was trained to generate summaries based on conversations and text-based inputs. It uses a seq2seq architecture, fine-tuned to produce accurate and coherent summaries.

Intended uses & limitations

Intended Uses:

Contextual text summarization
Summarizing chat logs, meeting transcripts, or conversational exchanges
Extracting key points or highlights from a larger body of text

Limitations:

May struggle with highly specialized or domain-specific language
Could produce summaries that may require further refinement for nuanced or highly technical content

Training and evaluation data

The model was trained on the prithivMLmods/Context-Based-Chat-Summary-Plus dataset, which consists of conversational and text data, with summaries representing the key elements of the content.

Data preprocessing:

Filters were applied to exclude entries with short headlines (less than 3 words) or text with fewer than 30 words.
The dataset was split into 90% training and 10% testing.

Training procedure

Training hyperparameters

Learning Rate: 5.6e-5
Train Batch Size: 64
Eval Batch Size: 64
Epochs: 6 (initially 4 epochs, followed by an additional 2 epochs)
Optimizer: AdamW with betas=(0.9,0.999) and epsilon=1e-08
Scheduler: Linear learning rate scheduler
Logging: Logging steps were set to show every epoch.

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum
3.9223	1.0	1384	2.0230	48.3053	25.5	44.5689	44.5717
2.4615	2.0	2768	1.8415	50.6518	27.4135	46.7611	46.7466
2.2896	3.0	4152	1.7868	51.4143	27.9301	47.4151	47.4095
2.1912	5.0	6920	1.7372	51.912	28.3549	47.8763	47.8849
2.1537	6.0	8304	1.7287	52.033	28.5069	47.9951	47.994

Framework versions:

Transformers: 4.47.1
Pytorch: 2.5.1+cu121
Datasets: 3.2.0
Tokenizers: 0.21.0

Evaluation

The model was evaluated using the ROUGE metric, achieving the following scores on the validation set:

Rouge-1: 52.033
Rouge-2: 28.5069
Rouge-L: 47.9951
Rouge-Lsum: 47.994

Final Results

After 6 epochs of training, the model was pushed to the Hugging Face Hub with the identifier ParitKansal/mt5-small-Context-Based-Chat-Summary-Plus. You can use it for summarization tasks directly.

Example Usage:

from transformers import pipeline

hub_model_id = "ParitKansal/mt5-small-Context-Based-Chat-Summary-Plus"
summarizer = pipeline("summarization", model=hub_model_id)
text = "Snehlata Shrivastava has been appointed as the Secretary General of the Lok Sabha, a notification issued by the Secretariat of the lower house said. She is the first woman to be elected for the post and will assume charge from December 1. She was earlier the Joint Secretary in the Law Ministry and has also worked in the Finance Ministry."
summary = summarizer(text)[0]['summary_text']
print("Predicted Summary: ", summary)

ParitKansal
/

mt5-small-Context-Based-Chat-Summary-Plus