Transformers documentation

LiteRT

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v4.49.0).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

LiteRT

LiteRT (previously known as TensorFlow Lite) is a high-performance runtime designed for on-device machine learning.

The Optimum library exports a model to LiteRT for many architectures.

The benefits of exporting to LiteRT include the following.

  • Low-latency, privacy-focused, no internet connectivity required, and reduced model size and power consumption for on-device machine learning.
  • Broad platform, model framework, and language support.
  • Hardware acceleration for GPUs and Apple Silicon.

Export a Transformers model to LiteRT with the Optimum CLI.

Run the command below to install Optimum and the exporters module for LiteRT.

pip install optimum[exporters-tf]

Refer to the Export a model to TFLite with optimum.exporters.tflite guide for all available arguments or with the command below.

optimum-cli export tflite --help

Set the --model argument to export a from the Hub.

optimum-cli export tflite --model google-bert/bert-base-uncased --sequence_length 128 bert_tflite/

You should see logs indicating the progress and showing where the resulting model.tflite is saved.

Validating TFLite model...
	-[✓] TFLite model output names match reference model (logits)
	- Validating TFLite Model output "logits":
		-[✓] (1, 128, 30522) matches (1, 128, 30522)
		-[x] values not close enough, max diff: 5.817413330078125e-05 (atol: 1e-05)
The TensorFlow Lite export succeeded with the warning: The maximum absolute difference between the output of the reference model and the TFLite exported model is not within the set tolerance 1e-05:
- logits: max diff = 5.817413330078125e-05.
 The exported model was saved at: bert_tflite

For local models, make sure the model weights and tokenizer files are saved in the same directory, for example local_path. Pass the directory to the --model argument and use --task to indicate the task a model can perform. If --task isn’t provided, the model architecture without a task-specific head is used.

optimum-cli export tflite --model local_path --task question-answering google-bert/bert-base-uncased --sequence_length 128 bert_tflite/
< > Update on GitHub