Model Card for Model ID

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

Model Description

Developed by: Yukieos
Model type: Image classification with OCR fallback pipeline
Language(s) (NLP): en
License: apache-2.0
Finetuned from model [optional]: torchvision/mobilenet_v3_small

Uses

Direct Use

Call the infer_category function to get a grocery label from an image, using OCR first and falling back to a MobileNetV3‐based classifier: category, ocr_text, method = infer_category(image_path) #can replace "image_path" with an address of image query = ocr_text if method=='ocr' else category if method=='manual': query = input("Please type the product you want to search：") print(f"Result: {category} (via {method})")

Out-of-Scope Use

Non-grocery products (e.g. electronics, clothing)

Packaging text in languages other than English

Highly distorted or tiny product labels

Bias, Risks, and Limitations

Class bias: only predicts the ~20 grocery categories seen in training

OCR bias: EasyOCR may misread stylized fonts, low-contrast text, or cluttered backgrounds

Failure mode: when OCR and classifier both low-confidence, returns None

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model:

pip install torch torchvision easyocr opencv-python pillow

from infer import infer_category

label, raw_text, method = infer_category("image.jpg")

print(f"Label: {label}, OCR saw: {raw_text}, Method: {method}")

Training Details

Training Data

~2,000 images collected in local grocery stores, evenly split across 20 product categories (e.g. “apple,” “banana,” “milk,” …). Manually labeled and organized into train/val/test splits.

Training Procedure

Preprocessing

Resize to 224×224 Random horizontal flips, random rotation ±15° Normalize with ImageNet means/stds

Training Hyperparameters

Training regime: 20 epochs on train split
Optimizer & LR: Adam, lr = 1e-4
Batch size: 32

Evaluation

Testing Data, Factors & Metrics

Testing Data

400 held-out images (20 per class) from the same data distribution.

Factors

Varying lighting Text printed vs. handwritten Partial occlusion

Metrics

Top-1 classification accuracy: 91.2%
OCR detection precision: 84.5%
Fallback rate: 12% (cases where OCR matched label, bypassing classifier)

Results

Overall pipeline accuracy on test set: 89.0%

Summary

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: NVIDIA Tesla T4
Hours used: ~1.5 hours
Cloud Provider: Google Colab
Compute Region: us-central1
Carbon Emitted: ~0.5g

Model Architecture and Objective

Backbone: MobileNetV3-Small, final fc → 20 classes

OCR: EasyOCR reader (en) for text detection + recognition

Loss: CrossEntropy for classification

Compute Infrastructure

Training environment: Colab notebook with single GPU

Inference environment: any machine with PyTorch + EasyOCR

Hardware

NVIDIA T4 (training & eval)

Software

Python 3.10

PyTorch 1.13, torchvision 0.14

EasyOCR 1.4, OpenCV 4.x, Pillow 9.x

Model Card Contact

[email protected]

yukieos
/

grocery_classification