omoured commited on
Commit
6307e13
Β·
1 Parent(s): 63db507

Initial commit with LFS tracking

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.index filter=lfs diff=lfs merge=lfs -text
37
+ *.jpg filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,14 +1,159 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: Fashion Search Engine
3
- emoji: 🏒
4
- colorFrom: purple
5
- colorTo: blue
6
- sdk: gradio
7
- sdk_version: 5.39.0
8
- app_file: app.py
9
- pinned: false
10
- license: cc-by-nc-nd-4.0
11
- short_description: About AI-powered fashion product search engine
 
 
 
 
 
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ›οΈ Fashion Search Engine (Image + Text)
2
+
3
+ This project provides an efficient way to search fashion products using either an image or a textual description. Users can search either by uploading an image or entering a descriptive text query, and the system will return visually or semantically similar fashion items.
4
+
5
+ Powered by **OpenAI’s CLIP ViT-B/32** model and accelerated using ONNX and FAISS for real-time retrieval.
6
+
7
+ ---
8
+
9
+ <div align="center">
10
+ <img src="misc/image-query.png" alt="Image Query Example" width="45%" style="margin-right: 2%;">
11
+ <img src="misc/text-query.png" alt="Text Query Example" width="45%">
12
+ </div>
13
+
14
+ <p align="center"><em>Example UI: Left - Image-based Search, Right - Text-based Search</em></p>
15
+
16
+ ---
17
+
18
+ ## 🧠 Model Details
19
+
20
+ To accelerate inference, we export both the **visual** and **text** encoders to **ONNX** format. Our benchmark results (`test_onnx.py`) demonstrate a **~32Γ— speedup** using ONNX Runtime compared to the original PyTorch models.
21
+
22
+ - **Model:** `ViT-B/32` (OpenAI CLIP)
23
+ - **Backends:**
24
+ - Image encoder β†’ ONNX
25
+ - Text encoder β†’ ONNX
26
+ - **Inference engine:** `onnxruntime`
27
+ - **Indexing:** `FAISS` with L2-normalized vectors
28
+ - **Benchmark:** ~32x speedup (measured on CPU using `test_onnx.py`)
29
+
30
+ ---
31
+
32
+ ## πŸ› οΈ Installation & Setup
33
+
34
+ ### 1. Environment Setup
35
+
36
+ ```bash
37
+ conda create -n product-match python=3.10
38
+ conda activate product-match
39
+ pip install -r requirements.txt
40
+ ```
41
+
42
+ Make sure MongoDB is running locally at `mongodb://localhost:27017` before continuing.
43
+
44
+ ---
45
+
46
+ ### 2. πŸ—‚οΈ Dataset Preparation
47
+
48
+ To experminet with this system we used [E-commerce Product Images ](https://www.kaggle.com/datasets/vikashrajluhaniwal/fashion-images)dataset from Kaggle. Run the following scripts to prepare the fashion dataset:
49
+
50
+ ```bash
51
+ # Download and structure the dataset
52
+ python get_dataset.py
53
+ # Augment product image path to the fashion.csv
54
+ python update_csv.py
55
+ ```
56
+
57
+ <div align="center">
58
+ <img src="misc/dataset-cover.png" alt="Dataset Cover" width="70%">
59
+ </div>
60
+
61
+ <p align="center"><em>Example samples from the Kaggle E-commerce Product Images dataset</em></p>
62
+
63
  ---
64
+
65
+ ### 3. 🧾 Generate Embeddings
66
+
67
+ From the `app/faiss/` directory:
68
+
69
+ ```bash
70
+ # Generate CLIP text embeddings from product descriptions
71
+ python generate_text_embeddings.py
72
+
73
+ # Generate CLIP image embeddings from product images
74
+ python generate_visual_embeddings.py
75
+ ```
76
+
77
+ These scripts will output `.csv` embedding files under `data/`.
78
+
79
  ---
80
 
81
+ ### 4. 🧠 Build FAISS Index
82
+
83
+ Navigate to the `app/faiss/` directory and run the following script to build indexes for fast similarity search:
84
+
85
+ ```bash
86
+ python build_faiss_index.py
87
+ ```
88
+
89
+ This script will generate:
90
+
91
+ * `faiss_image.index` – FAISS index for image embeddings
92
+ * `faiss_text.index` – FAISS index for text embeddings
93
+ * `image_id_to_meta.pkl` – metadata mapping for image results
94
+ * `text_id_to_meta.pkl` – metadata mapping for text results
95
+
96
+ These files are required for the search engine to return relevant product matches.
97
+
98
+ ---
99
+
100
+ ### 5. πŸ—ƒοΈ MongoDB Setup
101
+
102
+ Set up the MongoDB database for logging inference queries and results:
103
+
104
+ ```bash
105
+ cd app/db/
106
+ python mongo_setup.py
107
+ ```
108
+
109
+ This script will:
110
+
111
+ * Connect to `mongodb://localhost:27017`
112
+ * Create a database named `product_matching`
113
+ * Initialize a collection called `logs`
114
+
115
+ This collection will automatically store:
116
+
117
+ * Input query details (text or image)
118
+ * Top matching results with metadata
119
+ * Any runtime errors encountered during inference
120
+
121
+ ⚠️ Make sure MongoDB is installed and running locally before executing this step.
122
+
123
+
124
+ <div align="center">
125
+ <img src="misc/db_products.png" alt="Image Query Example" width="45%" style="margin-right: 2%;">
126
+ <img src="misc/db_logs.png" alt="Text Query Example" width="45%">
127
+ </div>
128
+
129
+ <p align="center"><em>Screenshots from the database logs and products.</em></p>
130
+
131
+ You can monitor logs using a MongoDB GUI like MongoDB Compass or via shell:
132
+
133
+ ```bash
134
+ mongo
135
+ use product_matching
136
+ db.logs.find().pretty()
137
+ ```
138
+
139
+ ---
140
+
141
+ ### 6. πŸ§ͺ Launch the Gradio Demo UI
142
+
143
+ After preparing the dataset, embeddings, FAISS indexes, and MongoDB, you can launch the interactive demo:
144
+
145
+ ```bash
146
+ python app/ui/gradio_search.py
147
+ ```
148
+
149
+ Once the script runs, Gradio will start a local web server and display a URL. You're now ready to explore and experiment with multi-modal product search. 🎯
150
+
151
+ ---
152
+
153
+ ## πŸ“„ References & Licensing
154
+
155
+ This project was developed as part of **Omar Moured's job application** for a position at [Sereact.ai](https://sereact.ai/).
156
+
157
+ The code, data processing scripts, and UI implementation provided in this repository are **not intended for public distribution or reuse**.
158
+
159
+ All content is protected under a **custom restricted-use license**. You may **not copy, distribute, modify, or use any portion of this codebase** without **explicit written permission** from the author.
app.py ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import gradio as gr
3
+ import onnxruntime as ort
4
+ import numpy as np
5
+ import faiss
6
+ import pickle
7
+ import tempfile
8
+ from PIL import Image
9
+ from torchvision import transforms
10
+ from transformers import CLIPTokenizer
11
+
12
+ # === Config ===
13
+ TOP_K = 3
14
+ IMG_ONNX_PATH = "models/clip_vitb32.onnx"
15
+ TXT_ONNX_PATH = "models/clip_text_encoder.onnx"
16
+ IMG_INDEX_PATH = "faiss/faiss_image.index"
17
+ TXT_INDEX_PATH = "faiss/faiss_text.index"
18
+ IMG_META_PATH = "faiss/image_id_to_meta.pkl"
19
+ TXT_META_PATH = "faiss/text_id_to_meta.pkl"
20
+
21
+ # === Load models and index ===
22
+ img_session = ort.InferenceSession(IMG_ONNX_PATH)
23
+ txt_session = ort.InferenceSession(TXT_ONNX_PATH)
24
+ img_input_name = img_session.get_inputs()[0].name
25
+ txt_input_name = txt_session.get_inputs()[0].name
26
+
27
+ img_index = faiss.read_index(IMG_INDEX_PATH)
28
+ txt_index = faiss.read_index(TXT_INDEX_PATH)
29
+
30
+ with open(IMG_META_PATH, "rb") as f:
31
+ img_meta = pickle.load(f)
32
+ img_meta = list(img_meta.items())
33
+
34
+ with open(TXT_META_PATH, "rb") as f:
35
+ txt_meta = pickle.load(f)
36
+ txt_meta = list(txt_meta.items())
37
+
38
+ tokenizer = CLIPTokenizer.from_pretrained("openai/clip-vit-base-patch32")
39
+
40
+ # === Preprocessing ===
41
+ transform = transforms.Compose([
42
+ transforms.Resize((224, 224)),
43
+ transforms.ToTensor(),
44
+ transforms.Normalize([0.5]*3, [0.5]*3)
45
+ ])
46
+
47
+ def search(input_img, input_text):
48
+ top_results = []
49
+ input_text_clean = input_text.strip() if isinstance(input_text, str) else ""
50
+ tmp_path = None
51
+
52
+ try:
53
+ real_img = isinstance(input_img, Image.Image)
54
+ has_text = input_text_clean != ""
55
+
56
+ if not real_img and not has_text:
57
+ return [], "❌ Please upload an image or type a query."
58
+
59
+ output_images = []
60
+ captions = []
61
+
62
+ if real_img:
63
+ image = input_img.convert("RGB")
64
+ tensor = transform(image).unsqueeze(0).numpy()
65
+ embedding = img_session.run(None, {img_input_name: tensor})[0]
66
+ embedding = embedding / np.linalg.norm(embedding, axis=1, keepdims=True)
67
+
68
+ scores, indices = img_index.search(embedding.astype(np.float32), TOP_K)
69
+ meta_list = img_meta
70
+ else:
71
+ query = f"Looking for a {input_text_clean}"
72
+ inputs = tokenizer(query, padding="max_length", max_length=77, return_tensors="np")
73
+ token_ids = inputs["input_ids"].astype(np.int64)
74
+ embedding = txt_session.run(None, {txt_input_name: token_ids})[0]
75
+ embedding = embedding / np.linalg.norm(embedding, axis=1, keepdims=True)
76
+ scores, indices = txt_index.search(embedding.astype(np.float32), TOP_K)
77
+ meta_list = txt_meta
78
+
79
+ for score, idx in zip(scores[0], indices[0]):
80
+ if idx == -1:
81
+ continue
82
+ try:
83
+ match_id, meta = meta_list[idx]
84
+ except Exception:
85
+ continue
86
+
87
+ img_path = meta.get("image_path")
88
+ if not img_path or not os.path.isfile(img_path):
89
+ continue
90
+
91
+ image = Image.open(img_path).convert("RGB")
92
+
93
+ caption = "\n".join([
94
+ f"πŸ†” ID: {match_id}",
95
+ f"🎨 Color: {meta.get('color', 'N/A')}",
96
+ f"πŸ‘— Product Type: {meta.get('product_type', 'N/A')}",
97
+ f"🚻 Gender: {meta.get('gender', 'N/A')}",
98
+ f"πŸ›οΈ Usage: {meta.get('usage', 'N/A')}",
99
+ f"πŸ“¦ Category: {meta.get('category', 'N/A')}",
100
+ f"πŸ“ˆ Score: {score:.3f}"
101
+ ])
102
+
103
+ output_images.append(image)
104
+ captions.append(caption)
105
+ top_results.append({
106
+ "match_id": match_id,
107
+ "score": float(score),
108
+ "metadata": meta,
109
+ "image_path": img_path
110
+ })
111
+
112
+ if not output_images:
113
+ return [], "⚠️ No matching results found."
114
+
115
+ return output_images, "\n\n".join(captions)
116
+
117
+ except Exception as e:
118
+ return [], f"❌ Error: {str(e)}"
119
+
120
+ # === Gradio UI ===
121
+ iface = gr.Interface(
122
+ fn=search,
123
+ inputs=[
124
+ gr.Image(type="pil", label="Upload Image (optional)", height=224),
125
+ gr.Textbox(label="Text Query (optional)", placeholder="e.g., red cotton top for girls")
126
+ ],
127
+ outputs=[
128
+ gr.Gallery(label="Top 3 Matches", columns=3, height=300),
129
+ gr.Textbox(label="Result Details")
130
+ ],
131
+ title="πŸ›οΈ Find your Fashion with Text or Image",
132
+ description="Upload a product image or enter a description to find similar fashion items.",
133
+ examples=[
134
+ ["examples/2697.jpg", ""],
135
+ ["examples/3150.jpg", ""],
136
+ [None, "blue denim jeans"],
137
+ [None, "white floral dress for summer"]
138
+ ]
139
+ )
140
+
141
+ iface.launch()
examples/2697.jpg ADDED

Git LFS Details

  • SHA256: 4de74ef9240846c99b52ac26fedac59a685b312fe24e80d79b5ce59a9228a84b
  • Pointer size: 130 Bytes
  • Size of remote file: 16 kB
examples/3150.jpg ADDED

Git LFS Details

  • SHA256: 785731e77deb13b5270481176bac3d4e70999df129674c2ab31fd2687d96648a
  • Pointer size: 131 Bytes
  • Size of remote file: 155 kB
faiss/faiss_image.index ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:35b4996c35317ad8e58c85f1f92065c4e18056d880c1a4b8b96d77d7a2b32944
3
+ size 5951533
faiss/faiss_text.index ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d0c3a658eafba31071f8161e6c0f7183d684f40016e6e2f8b24e00f41dca9dda
3
+ size 5951533
faiss/image_id_to_meta.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a3941e379637c5ab3e5e1fe2adb3cb793385bd7f41faf9d9bcc2c623f645711
3
+ size 399652
faiss/text_id_to_meta.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a8419d02942d1275ea2c8eb96d5e40e3bdf196abc7d9212f6ee775fae330721
3
+ size 482793
models/clip_text_encoder.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:846877caaad2fa0a2ad2411c12ba46f01bbc42ca927e3a8e53b3e2c4b678e69f
3
+ size 254433342
models/clip_vitb32.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a0de506b70897532e280e18e7fd271562f54585b9459a8d9ffd59e26fdeb03c3
3
+ size 351530149
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ gradio
2
+ onnxruntime
3
+ torch
4
+ transformers
5
+ Pillow
6
+ faiss-cpu