Image Classification
Core ML

FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization

Please observe original license.

Model Details

Evaluation - Variants

Variant Parameters Size (MB) Weight precision Act. precision Δ Pytorch acc
T8 3.6M 7.8 Float16 Float16 -0.9%
MA36 42.7M 84 Float16 Float16 -0.06%

Evaluation - Inference time

Variant Device OS Inference time (ms) Dominant compute unit
T8 iPhone 12 Pro Max 17.5 0.79 Neural Engine
T8 M3 Max 14.4 0.62 Neural Engine
MA36 iPhone 12 Pro Max 18.0 4.50 Neural Engine
MA36 M3 Max 15.0 2.99 Neural Engine

Download

Install huggingface-cli

brew install huggingface-cli

To download one of the .mlpackage folders to the models directory:

huggingface-cli download \
  --local-dir models --local-dir-use-symlinks False \
  apple/coreml-FastViT-T8 

Citation

@inproceedings{vasufastvit2023,
  author = {Pavan Kumar Anasosalu Vasu and James Gabriel and Jeff Zhu and Oncel Tuzel and Anurag Ranjan},
  title = {FastViT:  A Fast Hybrid Vision Transformer using Structural Reparameterization},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year = {2023}
}
Downloads last month
13
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the HF Inference API does not support coreml models with pipeline type image-classification

Dataset used to train apple/coreml-FastViT-MA36

Collections including apple/coreml-FastViT-MA36