My Awesome Food Model - Fine-tuned LoRA (Food101)

Overview

My Awesome Food Model - Fine-tuned LoRA (Food101) is an optimized version of google/vit-base-patch16-224-in21k trained using Low-Rank Adaptation (LoRA). This model enhances food classification tasks by efficiently fine-tuning a Vision Transformer on the Food101 dataset.

Model Details

Library: PEFT
License: Apache-2.0
Base Model: google/vit-base-patch16-224-in21k
Tags: generated_from_trainer
Evaluation Metric: Accuracy

Performance

The model achieves the following results on the evaluation set:

Loss: 0.6790
Accuracy: 82.13%

Intended Uses & Limitations

Intended Uses

Food classification tasks
Culinary image recognition applications
Research and development in efficient transformer fine-tuning using LoRA

Limitations

May not generalize well to food categories outside the Food101 dataset
Performance may degrade on images with poor lighting or unusual angles

Training & Evaluation Data

The model was fine-tuned using the Food101 dataset, which includes 101 food categories with 1,000 images per category. The dataset is split into 75% training data and 25% test data.

Training Procedure

Hyperparameters

The model was fine-tuned using the following hyperparameters:

Learning Rate: 0.005
Train Batch Size: 128
Eval Batch Size: 128
Seed: 42
Gradient Accumulation Steps: 4
Total Train Batch Size: 512
Optimizer: AdamW (betas=(0.9, 0.999), epsilon=1e-08)
LR Scheduler: Linear Decay
Number of Epochs: 5
Mixed Precision Training: Native AMP

Training Progress

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.8211	1.0	119	0.8232	78.20%
0.7385	2.0	238	0.7586	80.13%
0.6528	3.0	357	0.7408	80.57%
0.5283	4.0	476	0.6797	82.18%
0.5294	4.962	590	0.6790	82.13%

Framework Versions

PEFT: 0.15.0
Transformers: 4.49.0
PyTorch: 2.6.0
Datasets: 3.4.1
Tokenizers: 0.21.1

Usage

To use the model, load it with the transformers library and PEFT for efficient fine-tuning:

from transformers import ViTForImageClassification, ViTFeatureExtractor
from peft import PeftModel
from PIL import Image
import torch

# Load base model and LoRA adapter
base_model = ViTForImageClassification.from_pretrained("path_to_base_model")
model = PeftModel.from_pretrained(base_model, "path_to_lora_adapter")
feature_extractor = ViTFeatureExtractor.from_pretrained("path_to_base_model")

# Load and preprocess an image
image = Image.open("example_food.jpg")
inputs = feature_extractor(images=image, return_tensors="pt")

# Perform inference
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = predictions.argmax().item()

print(f"Predicted class: {predicted_class}")

Citation

If you use this model in your research or project, please cite it as follows:

@misc{my_awesome_food_model_lora,
  author = {Your Name},
  title = {My Awesome Food Model - Fine-tuned LoRA (Food101)},
  year = {2025},
  url = {https://huggingface.co/your_model_link}
}

Acknowledgments

This model was developed using the PEFT library for parameter-efficient fine-tuning and trained on the Food101 dataset. Special thanks to the Hugging Face community for providing invaluable tools and resources for optimizing transformer models.