metadata
license: apache-2.0
language:
- en
base_model:
- google/siglip2-so400m-patch14-384
pipeline_tag: image-classification
library_name: transformers
tags:
- fashion
- product
- usage
- Casual
- Ethnic
- Formal
Fashion-Product-Usage
Fashion-Product-Usage is a vision-language model fine-tuned from google/siglip2-base-patch16-224 using the SiglipForImageClassification architecture. It classifies fashion product images based on their intended usage context.
Classification Report:
precision recall f1-score support
Casual 0.8529 0.9716 0.9084 34392
Ethnic 0.8365 0.7528 0.7925 3208
Formal 0.7246 0.3006 0.4250 2345
Home 0.0000 0.0000 0.0000 1
Party 0.0000 0.0000 0.0000 29
Smart Casual 0.0000 0.0000 0.0000 67
Sports 0.7157 0.1848 0.2938 4004
Travel 0.0000 0.0000 0.0000 26
accuracy 0.8458 44072
macro avg 0.3912 0.2762 0.3024 44072
weighted avg 0.8300 0.8458 0.8159 44072
The model predicts one of the following usage categories:
- 0: Casual
- 1: Ethnic
- 2: Formal
- 3: Home
- 4: Party
- 5: Smart Casual
- 6: Sports
- 7: Travel
Run with Transformers 🤗
!pip install -q transformers torch pillow gradio
import gradio as gr
from transformers import AutoImageProcessor, SiglipForImageClassification
from PIL import Image
import torch
# Load model and processor
model_name = "prithivMLmods/Fashion-Product-Usage" # Replace with your actual model path
model = SiglipForImageClassification.from_pretrained(model_name)
processor = AutoImageProcessor.from_pretrained(model_name)
# Label mapping
id2label = {
0: "Casual",
1: "Ethnic",
2: "Formal",
3: "Home",
4: "Party",
5: "Smart Casual",
6: "Sports",
7: "Travel"
}
def classify_usage(image):
"""Predicts the usage type of a fashion product."""
image = Image.fromarray(image).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist()
predictions = {id2label[i]: round(probs[i], 3) for i in range(len(probs))}
return predictions
# Gradio interface
iface = gr.Interface(
fn=classify_usage,
inputs=gr.Image(type="numpy"),
outputs=gr.Label(label="Usage Prediction Scores"),
title="Fashion-Product-Usage",
description="Upload a fashion product image to predict its intended usage (Casual, Formal, Party, etc.)."
)
# Launch the app
if __name__ == "__main__":
iface.launch()
Intended Use
This model can be used for:
- Product tagging in e-commerce catalogs
- Context-aware product recommendations
- Fashion search optimization
- Data annotation for training recommendation engines