YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

My Awesome Food Model

Overview

My Awesome Food Model is a fine-tuned version of google/vit-base-patch16-224-in21k trained on the Food101 dataset. It is designed for food classification tasks, achieving high accuracy in recognizing various food categories.

Model Details

  • Library: Transformers
  • License: Apache-2.0
  • Base Model: google/vit-base-patch16-224-in21k
  • Tags: generated_from_trainer
  • Evaluation Metric: Accuracy

Performance

The model achieves the following results on the evaluation set:

  • Loss: 1.0698
  • Accuracy: 79.67%

Intended Uses & Limitations

Intended Uses

  • Food classification tasks
  • Culinary image recognition applications
  • Educational and research purposes in computer vision

Limitations

  • May not generalize well to food categories outside the Food101 dataset
  • Performance may degrade on images with poor lighting or unusual angles

Training & Evaluation Data

The model was trained using the Food101 dataset, which consists of 101 different food categories. Each category contains 1,000 images, with a predefined split of 75% training data and 25% test data.

Training Procedure

Hyperparameters

The model was trained using the following hyperparameters:

  • Learning Rate: 5e-05
  • Train Batch Size: 16
  • Eval Batch Size: 16
  • Seed: 42
  • Gradient Accumulation Steps: 4
  • Total Train Batch Size: 64
  • Optimizer: AdamW (betas=(0.9, 0.999), epsilon=1e-08)
  • LR Scheduler: Linear Warmup (10% warmup ratio)
  • Number of Epochs: 3
  • Mixed Precision Training: Native AMP

Training Progress

Training Loss Epoch Step Validation Loss Accuracy
1.973 1.0 947 1.9487 73.37%
1.1152 2.0 1894 1.2247 78.20%
0.9421 3.0 2841 1.0698 79.67%

Framework Versions

  • Transformers: 4.49.0
  • PyTorch: 2.6.0
  • Datasets: 3.4.1
  • Tokenizers: 0.21.1

Usage

To use the model, you can load it with the Hugging Face transformers library:

from transformers import ViTForImageClassification, ViTFeatureExtractor
from PIL import Image
import torch

# Load model and processor
model = ViTForImageClassification.from_pretrained("path_to_model")
feature_extractor = ViTFeatureExtractor.from_pretrained("path_to_model")

# Load and preprocess an image
image = Image.open("example_food.jpg")
inputs = feature_extractor(images=image, return_tensors="pt")

# Perform inference
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = predictions.argmax().item()

print(f"Predicted class: {predicted_class}")

Citation

If you use this model in your research or project, please cite it as follows:

@misc{my_awesome_food_model,
  author = {Your Name},
  title = {My Awesome Food Model},
  year = {2025},
  url = {https://huggingface.co/your_model_link}
}

Acknowledgments

This model was built using the Hugging Face Transformers library and trained on the Food101 dataset. Thanks to the Hugging Face community for providing excellent tools and resources for training and fine-tuning vision models.

Downloads last month
2
Safetensors
Model size
85.9M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support