My Awesome Food Model

Overview

My Awesome Food Model is a fine-tuned version of google/vit-base-patch16-224-in21k trained on the Food101 dataset. It is designed for food classification tasks, achieving high accuracy in recognizing various food categories.

Model Details

Library: Transformers
License: Apache-2.0
Base Model: google/vit-base-patch16-224-in21k
Tags: generated_from_trainer
Evaluation Metric: Accuracy

Performance

The model achieves the following results on the evaluation set:

Loss: 1.0698
Accuracy: 79.67%

Intended Uses & Limitations

Intended Uses

Food classification tasks
Culinary image recognition applications
Educational and research purposes in computer vision

Limitations

May not generalize well to food categories outside the Food101 dataset
Performance may degrade on images with poor lighting or unusual angles

Training & Evaluation Data

The model was trained using the Food101 dataset, which consists of 101 different food categories. Each category contains 1,000 images, with a predefined split of 75% training data and 25% test data.

Training Procedure

Hyperparameters

The model was trained using the following hyperparameters:

Learning Rate: 5e-05
Train Batch Size: 16
Eval Batch Size: 16
Seed: 42
Gradient Accumulation Steps: 4
Total Train Batch Size: 64
Optimizer: AdamW (betas=(0.9, 0.999), epsilon=1e-08)
LR Scheduler: Linear Warmup (10% warmup ratio)
Number of Epochs: 3
Mixed Precision Training: Native AMP

Training Progress

Training Loss	Epoch	Step	Validation Loss	Accuracy
1.973	1.0	947	1.9487	73.37%
1.1152	2.0	1894	1.2247	78.20%
0.9421	3.0	2841	1.0698	79.67%

Framework Versions

Transformers: 4.49.0
PyTorch: 2.6.0
Datasets: 3.4.1
Tokenizers: 0.21.1

Usage

To use the model, you can load it with the Hugging Face transformers library:

from transformers import ViTForImageClassification, ViTFeatureExtractor
from PIL import Image
import torch

# Load model and processor
model = ViTForImageClassification.from_pretrained("path_to_model")
feature_extractor = ViTFeatureExtractor.from_pretrained("path_to_model")

# Load and preprocess an image
image = Image.open("example_food.jpg")
inputs = feature_extractor(images=image, return_tensors="pt")

# Perform inference
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = predictions.argmax().item()

print(f"Predicted class: {predicted_class}")

Citation

If you use this model in your research or project, please cite it as follows:

@misc{my_awesome_food_model,
  author = {Your Name},
  title = {My Awesome Food Model},
  year = {2025},
  url = {https://huggingface.co/your_model_link}
}

Acknowledgments

This model was built using the Hugging Face Transformers library and trained on the Food101 dataset. Thanks to the Hugging Face community for providing excellent tools and resources for training and fine-tuning vision models.