Transformers documentation
FLAN-UL2
Get started
Tutorials
Run inference with pipelinesWrite portable code with AutoClassPreprocess dataFine-tune a pretrained modelTrain with a scriptSet up distributed training with 🤗 AccelerateLoad and train adapters with 🤗 PEFTShare your modelAgents 101Agents, supercharged - Multi-agents, External tools, and moreGeneration with LLMsChatting with Transformers
Task Guides
Natural Language Processing
Audio
Computer Vision
Multimodal
Generation
Prompting
Developer guides
Use fast tokenizers from 🤗 TokenizersRun inference with multilingual modelsUse model-specific APIsShare a custom modelChat templatesTrainerRun training on Amazon SageMakerExport to ONNXExport to TFLiteExport to TorchScriptBenchmarksNotebooks with examplesCommunity resourcesTroubleshootInteroperability with GGUF filesInteroperability with TikToken filesModularity in `transformers`Model Hacking (overwriting a class to your usage)
Quantization Methods
Getting startedbitsandbytesGPTQAWQAQLMVPTQQuantoEETQHIGGSHQQFBGEMM_FP8OptimumTorchAOBitNetcompressed-tensorsContribute new quantization method
Performance and scalability
OverviewLLM inference optimization Instantiate a big modelDebuggingXLA Integration for TensorFlow ModelsOptimize inference using `torch.compile()`
Efficient training techniques
Methods and tools for efficient training on a single GPUMultiple GPUs and parallelismFully Sharded Data ParallelDeepSpeedEfficient training on CPUDistributed CPU trainingTraining on TPU with TensorFlowPyTorch training on Apple siliconCustom hardware for trainingHyperparameter Search using Trainer API
Optimizing inference
Contribute
How to contribute to 🤗 Transformers?How to add a model to 🤗 Transformers?How to add a pipeline to 🤗 Transformers?TestingChecks on a Pull Request
Conceptual guides
PhilosophyGlossaryWhat 🤗 Transformers can doHow 🤗 Transformers solve tasksThe Transformer model familySummary of the tokenizersAttention mechanismsPadding and truncationBERTologyPerplexity of fixed-length modelsPipelines for webserver inferenceModel training anatomyGetting the most out of LLMs
API
Main Classes
Agents and ToolsAuto ClassesBackbonesCallbacksConfigurationData CollatorKeras callbacksLoggingModelsText GenerationONNXOptimizationModel outputsPipelinesProcessorsQuantizationTokenizerTrainerDeepSpeedExecuTorchFeature ExtractorImage Processor
Models
Text models
ALBERTBambaBARTBARThezBARTphoBERTBertGenerationBertJapaneseBertweetBigBirdBigBirdPegasusBioGptBlenderbotBlenderbot SmallBLOOMBORTByT5CamemBERTCANINECodeGenCodeLlamaCohereCohere2ConvBERTCPMCPMANTCTRLDBRXDeBERTaDeBERTa-v2DialoGPTDiffLlamaDistilBERTDPRELECTRAEncoder Decoder ModelsERNIEErnieMESMFalconFalcon3FalconMambaFastSpeech2ConformerFLAN-T5FLAN-UL2FlauBERTFNetFSMTFunnel TransformerFuyuGemmaGemma2GLMGPTGPT NeoGPT NeoXGPT NeoX JapaneseGPT-JGPT2GPTBigCodeGPTSAN JapaneseGPTSw3GraniteGraniteMoeHerBERTI-BERTJambaJetMoeJukeboxLEDLLaMALlama2Llama3LongformerLongT5LUKEM2M100MADLAD-400Mambamamba2MarianMTMarkupLMMBart and MBart-50MEGAMegatronBERTMegatronGPT2MistralMixtralmLUKEMobileBERTModernBertMPNetMPTMRAMT5MVPmyt5NemotronNEZHANLLBNLLB-MoENyströmformerOLMoOLMo2OLMoEOpen-LlamaOPTPegasusPEGASUS-XPersimmonPhiPhi-3PhiMoEPhoBERTPLBartProphetNetQDQBertQwen2Qwen2MoERAGREALMRecurrentGemmaReformerRemBERTRetriBERTRoBERTaRoBERTa-PreLayerNormRoCBertRoFormerRWKVSplinterSqueezeBERTStableLmStarcoder2SwitchTransformersT5T5v1.1TAPEXTransformer XLUL2UMT5X-MODXGLMXLMXLM-ProphetNetXLM-RoBERTaXLM-RoBERTa-XLXLM-VXLNetYOSOZamba
Vision models
Audio models
Video models
Multimodal models
Reinforcement learning models
Time series models
Graph models
Internal Helpers
You are viewing v4.48.2 version. A newer version v5.8.1 is available.
FLAN-UL2
Overview
Flan-UL2 is an encoder decoder model based on the T5 architecture. It uses the same configuration as the UL2 model released earlier last year.
It was fine tuned using the “Flan” prompt tuning and dataset collection. Similar to Flan-T5, one can directly use FLAN-UL2 weights without finetuning the model:
According to the original blog here are the notable improvements:
- The original UL2 model was only trained with receptive field of 512, which made it non-ideal for N-shot prompting where N is large.
- The Flan-UL2 checkpoint uses a receptive field of 2048 which makes it more usable for few-shot in-context learning.
- The original UL2 model also had mode switch tokens that was rather mandatory to get good performance. However, they were a little cumbersome as this requires often some changes during inference or finetuning. In this update/change, we continue training UL2 20B for an additional 100k steps (with small batch) to forget “mode tokens” before applying Flan instruction tuning. This Flan-UL2 checkpoint does not require mode tokens anymore. Google has released the following variants:
The original checkpoints can be found here.
Running on low resource devices
The model is pretty heavy (~40GB in half precision) so if you just want to run the model, make sure you load your model in 8bit, and use device_map="auto" to make sure you don’t have any OOM issue!
>>> from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
>>> model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-ul2", load_in_8bit=True, device_map="auto")
>>> tokenizer = AutoTokenizer.from_pretrained("google/flan-ul2")
>>> inputs = tokenizer("A step by step recipe to make bolognese pasta:", return_tensors="pt")
>>> outputs = model.generate(**inputs)
>>> print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
['In a large skillet, brown the ground beef and onion over medium heat. Add the garlic']Refer to T5’s documentation page for API reference, tips, code examples and notebooks.