My Awesome Food Model - Fine-tuned LoRA (Food101)
Overview
My Awesome Food Model - Fine-tuned LoRA (Food101) is an optimized version of google/vit-base-patch16-224-in21k trained using Low-Rank Adaptation (LoRA). This model enhances food classification tasks by efficiently fine-tuning a Vision Transformer on the Food101 dataset.
Model Details
- Library: PEFT
- License: Apache-2.0
- Base Model: google/vit-base-patch16-224-in21k
- Tags: generated_from_trainer
- Evaluation Metric: Accuracy
Performance
The model achieves the following results on the evaluation set:
- Loss: 0.6790
- Accuracy: 82.13%
Intended Uses & Limitations
Intended Uses
- Food classification tasks
- Culinary image recognition applications
- Research and development in efficient transformer fine-tuning using LoRA
Limitations
- May not generalize well to food categories outside the Food101 dataset
- Performance may degrade on images with poor lighting or unusual angles
Training & Evaluation Data
The model was fine-tuned using the Food101 dataset, which includes 101 food categories with 1,000 images per category. The dataset is split into 75% training data and 25% test data.
Training Procedure
Hyperparameters
The model was fine-tuned using the following hyperparameters:
- Learning Rate: 0.005
- Train Batch Size: 128
- Eval Batch Size: 128
- Seed: 42
- Gradient Accumulation Steps: 4
- Total Train Batch Size: 512
- Optimizer: AdamW (betas=(0.9, 0.999), epsilon=1e-08)
- LR Scheduler: Linear Decay
- Number of Epochs: 5
- Mixed Precision Training: Native AMP
Training Progress
Training Loss | Epoch | Step | Validation Loss | Accuracy |
---|---|---|---|---|
0.8211 | 1.0 | 119 | 0.8232 | 78.20% |
0.7385 | 2.0 | 238 | 0.7586 | 80.13% |
0.6528 | 3.0 | 357 | 0.7408 | 80.57% |
0.5283 | 4.0 | 476 | 0.6797 | 82.18% |
0.5294 | 4.962 | 590 | 0.6790 | 82.13% |
Framework Versions
- PEFT: 0.15.0
- Transformers: 4.49.0
- PyTorch: 2.6.0
- Datasets: 3.4.1
- Tokenizers: 0.21.1
Usage
To use the model, load it with the transformers
library and PEFT
for efficient fine-tuning:
from transformers import ViTForImageClassification, ViTFeatureExtractor
from peft import PeftModel
from PIL import Image
import torch
# Load base model and LoRA adapter
base_model = ViTForImageClassification.from_pretrained("path_to_base_model")
model = PeftModel.from_pretrained(base_model, "path_to_lora_adapter")
feature_extractor = ViTFeatureExtractor.from_pretrained("path_to_base_model")
# Load and preprocess an image
image = Image.open("example_food.jpg")
inputs = feature_extractor(images=image, return_tensors="pt")
# Perform inference
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = predictions.argmax().item()
print(f"Predicted class: {predicted_class}")
Citation
If you use this model in your research or project, please cite it as follows:
@misc{my_awesome_food_model_lora,
author = {Your Name},
title = {My Awesome Food Model - Fine-tuned LoRA (Food101)},
year = {2025},
url = {https://huggingface.co/your_model_link}
}
Acknowledgments
This model was developed using the PEFT library for parameter-efficient fine-tuning and trained on the Food101 dataset. Special thanks to the Hugging Face community for providing invaluable tools and resources for optimizing transformer models.