--- language: en datasets: - efra license: apache-2.0 tags: - summarization - flan-t5 - legal - food model_type: t5 pipeline_tag: text2text-generation --- # Flan-T5 Large Fine-Tuned on EFRA Dataset This is a fine-tuned version of [Flan-T5 Large](https://huggingface.co/google/flan-t5-large) on the **EFRA dataset** for summarizing legal documents related to food regulations and policies. ## Model Description Flan-T5 is a sequence-to-sequence model trained for text-to-text tasks. This fine-tuned version is specifically optimized for summarizing legal text in the domain of food legislation, regulatory requirements, and compliance documents. ### Fine-Tuning Details - **Base Model**: [google/flan-t5-large](https://huggingface.co/google/flan-t5-large) - **Dataset**: EFRA (a curated dataset of legal documents in the food domain) - **Objective**: Summarization of legal documents - **Framework**: Hugging Face Transformers ## Applications This model is suitable for: - Summarizing legal texts in the food domain - Extracting key information from lengthy regulatory documents - Assisting legal professionals and food companies in understanding compliance requirements ## Example Usage ```python from transformers import AutoModelForSeq2SeqLM, AutoTokenizer # Load the model and tokenizer model = AutoModelForSeq2SeqLM.from_pretrained("giuid/flan_t5_large_summarization_v2") tokenizer = AutoTokenizer.from_pretrained("giuid/flan_t5_large_summarization_v2") # Input text input_text = "Your lengthy legal document text here..." # Tokenize and generate summary inputs = tokenizer(input_text, return_tensors="pt", max_length=512, truncation=True) outputs = model.generate(inputs.input_ids, max_length=150, num_beams=5, early_stopping=True) # Decode summary summary = tokenizer.decode(outputs[0], skip_special_tokens=True) print(summary)