My Awesome Food Model
Overview
My Awesome Food Model is a fine-tuned version of google/vit-base-patch16-224-in21k trained on the Food101 dataset. It is designed for food classification tasks, achieving high accuracy in recognizing various food categories.
Model Details
- Library: Transformers
- License: Apache-2.0
- Base Model: google/vit-base-patch16-224-in21k
- Tags: generated_from_trainer
- Evaluation Metric: Accuracy
Performance
The model achieves the following results on the evaluation set:
- Loss: 1.0698
- Accuracy: 79.67%
Intended Uses & Limitations
Intended Uses
- Food classification tasks
- Culinary image recognition applications
- Educational and research purposes in computer vision
Limitations
- May not generalize well to food categories outside the Food101 dataset
- Performance may degrade on images with poor lighting or unusual angles
Training & Evaluation Data
The model was trained using the Food101 dataset, which consists of 101 different food categories. Each category contains 1,000 images, with a predefined split of 75% training data and 25% test data.
Training Procedure
Hyperparameters
The model was trained using the following hyperparameters:
- Learning Rate: 5e-05
- Train Batch Size: 16
- Eval Batch Size: 16
- Seed: 42
- Gradient Accumulation Steps: 4
- Total Train Batch Size: 64
- Optimizer: AdamW (betas=(0.9, 0.999), epsilon=1e-08)
- LR Scheduler: Linear Warmup (10% warmup ratio)
- Number of Epochs: 3
- Mixed Precision Training: Native AMP
Training Progress
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
1.973 | 1.0 | 947 | 1.9487 | 73.37% |
1.1152 | 2.0 | 1894 | 1.2247 | 78.20% |
0.9421 | 3.0 | 2841 | 1.0698 | 79.67% |
Framework Versions
- Transformers: 4.49.0
- PyTorch: 2.6.0
- Datasets: 3.4.1
- Tokenizers: 0.21.1
Usage
To use the model, you can load it with the Hugging Face transformers
library:
from transformers import ViTForImageClassification, ViTFeatureExtractor
from PIL import Image
import torch
# Load model and processor
model = ViTForImageClassification.from_pretrained("path_to_model")
feature_extractor = ViTFeatureExtractor.from_pretrained("path_to_model")
# Load and preprocess an image
image = Image.open("example_food.jpg")
inputs = feature_extractor(images=image, return_tensors="pt")
# Perform inference
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = predictions.argmax().item()
print(f"Predicted class: {predicted_class}")
Citation
If you use this model in your research or project, please cite it as follows:
@misc{my_awesome_food_model,
author = {Your Name},
title = {My Awesome Food Model},
year = {2025},
url = {https://huggingface.co/your_model_link}
}
Acknowledgments
This model was built using the Hugging Face Transformers library and trained on the Food101 dataset. Thanks to the Hugging Face community for providing excellent tools and resources for training and fine-tuning vision models.
- Downloads last month
- 2