Model Card for Model ID
This modelcard aims to be a base template for new models. It has been generated using this raw template.
Model Details
Model Description
- Developed by: Yukieos
- Model type: Image classification with OCR fallback pipeline
- Language(s) (NLP): en
- License: apache-2.0
- Finetuned from model [optional]: torchvision/mobilenet_v3_small
Uses
Direct Use
Call the infer_category function to get a grocery label from an image, using OCR first and falling back to a MobileNetV3‐based classifier: category, ocr_text, method = infer_category(image_path) #can replace "image_path" with an address of image query = ocr_text if method=='ocr' else category if method=='manual': query = input("Please type the product you want to search:") print(f"Result: {category} (via {method})")
Out-of-Scope Use
Non-grocery products (e.g. electronics, clothing)
Packaging text in languages other than English
Highly distorted or tiny product labels
Bias, Risks, and Limitations
Class bias: only predicts the ~20 grocery categories seen in training
OCR bias: EasyOCR may misread stylized fonts, low-contrast text, or cluttered backgrounds
Failure mode: when OCR and classifier both low-confidence, returns None
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use the code below to get started with the model:
pip install torch torchvision easyocr opencv-python pillow
from infer import infer_category
label, raw_text, method = infer_category("image.jpg")
print(f"Label: {label}, OCR saw: {raw_text}, Method: {method}")
Training Details
Training Data
~2,000 images collected in local grocery stores, evenly split across 20 product categories (e.g. “apple,” “banana,” “milk,” …). Manually labeled and organized into train/val/test splits.
Training Procedure
Preprocessing
Resize to 224×224 Random horizontal flips, random rotation ±15° Normalize with ImageNet means/stds
Training Hyperparameters
- Training regime: 20 epochs on train split
- Optimizer & LR: Adam, lr = 1e-4
- Batch size: 32
Evaluation
Testing Data, Factors & Metrics
Testing Data
400 held-out images (20 per class) from the same data distribution.
Factors
Varying lighting Text printed vs. handwritten Partial occlusion
Metrics
- Top-1 classification accuracy: 91.2%
- OCR detection precision: 84.5%
- Fallback rate: 12% (cases where OCR matched label, bypassing classifier)
Results
Overall pipeline accuracy on test set: 89.0%
Summary
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: NVIDIA Tesla T4
- Hours used: ~1.5 hours
- Cloud Provider: Google Colab
- Compute Region: us-central1
- Carbon Emitted: ~0.5g
Model Architecture and Objective
Backbone: MobileNetV3-Small, final fc → 20 classes
OCR: EasyOCR reader (en) for text detection + recognition
Loss: CrossEntropy for classification
Compute Infrastructure
Training environment: Colab notebook with single GPU
Inference environment: any machine with PyTorch + EasyOCR
Hardware
NVIDIA T4 (training & eval)
Software
Python 3.10
PyTorch 1.13, torchvision 0.14
EasyOCR 1.4, OpenCV 4.x, Pillow 9.x
Model Card Contact
Model tree for yukieos/grocery_classification
Base model
qualcomm/EasyOCR