feat: test upload - Trendyol DinoV2 Product Similarity and Retrieval Embedding Model
Browse files🧪 Test Upload Details:
- Personal account testing before company publication
- Architecture: ConvNeXt-Base + ArcFace loss
- Embedding dimension: 256
- Task: Product similarity and retrieval
📁 Repository Contents:
- Model weights in safetensors format
- Complete model card with usage examples
- Apache 2.0 license
- Demo notebook for inference
🔒 Security: Scanned and validated
📋 RFC Compliance: Ready for company publication
Test upload by: Personal Account
- LICENSE +189 -0
- README.md +130 -0
- __init__.py +23 -0
- __pycache__/modeling_trendyol_dinov2.cpython-312.pyc +0 -0
- config.json +54 -0
- image_processing_trendyol_dinov2.py +163 -0
- model.safetensors +3 -0
- modeling_trendyol_dinov2.py +142 -0
- preprocessor_config.json +43 -0
- pytorch_model.bin +3 -0
- requirements.txt +7 -0
LICENSE
ADDED
@@ -0,0 +1,189 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Apache License
|
2 |
+
Version 2.0, January 2004
|
3 |
+
http://www.apache.org/licenses/
|
4 |
+
|
5 |
+
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
6 |
+
|
7 |
+
1. Definitions.
|
8 |
+
|
9 |
+
"License" shall mean the terms and conditions for use, reproduction,
|
10 |
+
and distribution as defined by Sections 1 through 9 of this document.
|
11 |
+
|
12 |
+
"Licensor" shall mean the copyright owner or entity granting the License.
|
13 |
+
|
14 |
+
"Legal Entity" shall mean the union of the acting entity and all
|
15 |
+
other entities that control, are controlled by, or are under common
|
16 |
+
control with that entity. For the purposes of the definition of
|
17 |
+
"control", an entity controls another entity when such entity:
|
18 |
+
(i) has the power, direct or indirect, to cause the direction or
|
19 |
+
management of such other entity, whether by contract or otherwise,
|
20 |
+
(ii) owns fifty percent (50%) or more of the outstanding shares, or
|
21 |
+
(iii) has beneficial ownership of such entity.
|
22 |
+
|
23 |
+
"You" (or "Your") shall mean an individual or Legal Entity
|
24 |
+
exercising permissions granted by this License.
|
25 |
+
|
26 |
+
"Source" shall mean the preferred form for making modifications,
|
27 |
+
including but not limited to software source code, documentation
|
28 |
+
source, and configuration files.
|
29 |
+
|
30 |
+
"Object" shall mean any form resulting from mechanical
|
31 |
+
transformation or translation of a Source form, including but
|
32 |
+
not limited to compiled object code, generated documentation,
|
33 |
+
and conversions to other media types.
|
34 |
+
|
35 |
+
"Work" shall mean the work of authorship, whether in Source or
|
36 |
+
Object form, made available under the License, as indicated by a
|
37 |
+
copyright notice that is included in or attached to the work
|
38 |
+
(which shall not include communication that is conspicuously
|
39 |
+
marked or otherwise designated in writing by the copyright owner
|
40 |
+
as "Not a Contribution").
|
41 |
+
|
42 |
+
"Derivative Works" shall mean any work, whether in Source or Object
|
43 |
+
form, that is based upon (or derived from) the Work and for which the
|
44 |
+
editorial revisions, annotations, elaborations, or other modifications
|
45 |
+
represent, as a whole, an original work of authorship. For the purposes
|
46 |
+
of this License, Derivative Works shall not include works that remain
|
47 |
+
separable from, or merely link (or bind by name) to the interfaces of,
|
48 |
+
the Work and derivative works thereof.
|
49 |
+
|
50 |
+
"Contribution" shall mean any work of authorship, including
|
51 |
+
the original version of the Work and any modifications or additions
|
52 |
+
to that Work or Derivative Works thereof, that is intentionally
|
53 |
+
submitted to Licensor for inclusion in the Work by the copyright owner
|
54 |
+
or by an individual or Legal Entity authorized to submit on behalf of
|
55 |
+
the copyright owner. For the purposes of the definition of "Contribution",
|
56 |
+
any such Contribution intentionally submitted for inclusion in the Work
|
57 |
+
by You to the Licensor shall be deemed to have been made under the
|
58 |
+
terms and conditions of this License, without any additional terms or
|
59 |
+
conditions. Notwithstanding the above, nothing herein shall supersede or
|
60 |
+
modify the terms of any separate license agreement you may have executed
|
61 |
+
with Licensor regarding such Contributions.
|
62 |
+
|
63 |
+
2. Grant of Copyright License. Subject to the terms and conditions of
|
64 |
+
this License, each Contributor hereby grants to You a perpetual,
|
65 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
66 |
+
copyright license to use, reproduce, modify, merge, publish,
|
67 |
+
distribute, sublicense, and/or sell copies of the Work, and to
|
68 |
+
permit persons to whom the Work is furnished to do so, subject to
|
69 |
+
the following conditions:
|
70 |
+
|
71 |
+
The above copyright notice and this permission notice shall be
|
72 |
+
included in all copies or substantial portions of the Work.
|
73 |
+
|
74 |
+
3. Grant of Patent License. Subject to the terms and conditions of
|
75 |
+
this License, each Contributor hereby grants to You a perpetual,
|
76 |
+
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
77 |
+
(except as stated in this section) patent license to make, have made,
|
78 |
+
use, offer to sell, sell, import, and otherwise transfer the Work,
|
79 |
+
where such license applies only to those patent claims licensable
|
80 |
+
by such Contributor that are necessarily infringed by their
|
81 |
+
Contribution(s) alone or by combination of their Contribution(s)
|
82 |
+
with the Work to which such Contribution(s) was submitted. If You
|
83 |
+
institute patent litigation against any entity (including a
|
84 |
+
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
85 |
+
or a Contribution incorporated within the Work constitutes direct
|
86 |
+
or contributory patent infringement, then any patent licenses
|
87 |
+
granted to You under this License for that Work shall terminate
|
88 |
+
as of the date such litigation is filed.
|
89 |
+
|
90 |
+
4. Redistribution. You may reproduce and distribute copies of the
|
91 |
+
Work or Derivative Works thereof in any medium, with or without
|
92 |
+
modifications, and in Source or Object form, provided that You
|
93 |
+
meet the following conditions:
|
94 |
+
|
95 |
+
(a) You must give any other recipients of the Work or
|
96 |
+
Derivative Works a copy of this License; and
|
97 |
+
|
98 |
+
(b) You must cause any modified files to carry prominent notices
|
99 |
+
stating that You changed the files; and
|
100 |
+
|
101 |
+
(c) You must retain, in the Source form of any Derivative Works
|
102 |
+
that You distribute, all copyright, trademark, patent,
|
103 |
+
and attribution notices from the Source form of the Work,
|
104 |
+
excluding those notices that do not pertain to any part of
|
105 |
+
the Derivative Works; and
|
106 |
+
|
107 |
+
(d) If the Work includes a "NOTICE" text file as part of its
|
108 |
+
distribution, then any Derivative Works that You distribute must
|
109 |
+
include a readable copy of the attribution notices contained
|
110 |
+
within such NOTICE file, excluding those notices that do not
|
111 |
+
pertain to any part of the Derivative Works, in at least one
|
112 |
+
of the following places: within a NOTICE text file distributed
|
113 |
+
as part of the Derivative Works; within the Source form or
|
114 |
+
documentation, if provided along with the Derivative Works; or,
|
115 |
+
within a display generated by the Derivative Works, if and
|
116 |
+
wherever such third-party notices normally appear. The contents
|
117 |
+
of the NOTICE file are for informational purposes only and
|
118 |
+
do not modify the License. You may add Your own attribution
|
119 |
+
notices within Derivative Works that You distribute, alongside
|
120 |
+
or as an addendum to the NOTICE text from the Work, provided
|
121 |
+
that such additional attribution notices cannot be construed
|
122 |
+
as modifying the License.
|
123 |
+
|
124 |
+
You may add Your own copyright notice to Your modifications and
|
125 |
+
may provide additional or different license terms and conditions
|
126 |
+
for use, reproduction, or distribution of Your modifications, or
|
127 |
+
for any such Derivative Works as a whole, provided Your use,
|
128 |
+
reproduction, and distribution of the Work otherwise complies with
|
129 |
+
the conditions stated in this License.
|
130 |
+
|
131 |
+
5. Submission of Contributions. Unless You explicitly state otherwise,
|
132 |
+
any Contribution intentionally submitted for inclusion in the Work
|
133 |
+
by You to the Licensor shall be under the terms and conditions of
|
134 |
+
this License, without any additional terms or conditions.
|
135 |
+
Notwithstanding the above, nothing herein shall supersede or modify
|
136 |
+
the terms of any separate license agreement you may have executed
|
137 |
+
with Licensor regarding such Contributions.
|
138 |
+
|
139 |
+
6. Trademarks. This License does not grant permission to use the trade
|
140 |
+
names, trademarks, service marks, or product names of the Licensor,
|
141 |
+
except as required for reasonable and customary use in describing the
|
142 |
+
origin of the Work and reproducing the content of the NOTICE file.
|
143 |
+
|
144 |
+
7. Disclaimer of Warranty. Unless required by applicable law or
|
145 |
+
agreed to in writing, Licensor provides the Work (and each
|
146 |
+
Contributor provides its Contributions) on an "AS IS" BASIS,
|
147 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
148 |
+
implied, including, without limitation, any warranties or conditions
|
149 |
+
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
150 |
+
PARTICULAR PURPOSE. You are solely responsible for determining the
|
151 |
+
appropriateness of using or redistributing the Work and assume any
|
152 |
+
risks associated with Your exercise of permissions under this License.
|
153 |
+
|
154 |
+
8. Limitation of Liability. In no event and under no legal theory,
|
155 |
+
whether in tort (including negligence), contract, or otherwise,
|
156 |
+
unless required by applicable law (such as deliberate and grossly
|
157 |
+
negligent acts) or agreed to in writing, shall any Contributor be
|
158 |
+
liable to You for damages, including any direct, indirect, special,
|
159 |
+
incidental, or consequential damages of any character arising as a
|
160 |
+
result of this License or out of the use or inability to use the
|
161 |
+
Work (including but not limited to damages for loss of goodwill,
|
162 |
+
work stoppage, computer failure or malfunction, or any and all
|
163 |
+
other commercial damages or losses), even if such Contributor
|
164 |
+
has been advised of the possibility of such damages.
|
165 |
+
|
166 |
+
9. Accepting Warranty or Support. You are not required to accept
|
167 |
+
warranty or support for the Work under this License. However, if You
|
168 |
+
choose to accept warranty or support, You may act only on Your own
|
169 |
+
behalf and on Your sole responsibility, not on behalf of any other
|
170 |
+
Contributor, and only if You agree to indemnify, defend, and hold each
|
171 |
+
Contributor harmless for any liability incurred by, or claims asserted
|
172 |
+
against, such Contributor by reason of your accepting any such warranty
|
173 |
+
or support.
|
174 |
+
|
175 |
+
END OF TERMS AND CONDITIONS
|
176 |
+
|
177 |
+
Copyright 2025 Trendyol
|
178 |
+
|
179 |
+
Licensed under the Apache License, Version 2.0 (the "License");
|
180 |
+
you may not use this file except in compliance with the License.
|
181 |
+
You may obtain a copy of the License at
|
182 |
+
|
183 |
+
http://www.apache.org/licenses/LICENSE-2.0
|
184 |
+
|
185 |
+
Unless required by applicable law or agreed to in writing, software
|
186 |
+
distributed under the License is distributed on an "AS IS" BASIS,
|
187 |
+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
188 |
+
See the License for the specific language governing permissions and
|
189 |
+
limitations under the License.
|
README.md
ADDED
@@ -0,0 +1,130 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Trendyol DinoV2 Image Similarity Model
|
2 |
+
|
3 |
+
This repository contains a fine-tuned DinoV2 model for image similarity and retrieval tasks, specifically trained on e-commerce product images.
|
4 |
+
|
5 |
+
## Model Details
|
6 |
+
|
7 |
+
- **Model Type**: Image Similarity/Retrieval
|
8 |
+
- **Architecture**: DinoV2 ViT-B/14 with ArcFace loss
|
9 |
+
- **Embedding Dimension**: 256
|
10 |
+
- **Input Size**: 224x224
|
11 |
+
- **Framework**: PyTorch
|
12 |
+
- **Format**: SafeTensors
|
13 |
+
|
14 |
+
## Usage
|
15 |
+
|
16 |
+
### Quick Start
|
17 |
+
|
18 |
+
```python
|
19 |
+
import torch
|
20 |
+
from PIL import Image
|
21 |
+
from transformers import AutoModel, AutoImageProcessor
|
22 |
+
|
23 |
+
device = 'cuda'
|
24 |
+
|
25 |
+
# Load model and processor from Hugging Face Hub
|
26 |
+
model = AutoModel.from_pretrained("Trendyol/trendyol-dino-v2-ecommerce-256d", trust_remote_code=True)
|
27 |
+
processor = AutoImageProcessor.from_pretrained("Trendyol/trendyol-dino-v2-ecommerce-256d", trust_remote_code=True)
|
28 |
+
|
29 |
+
# Load and process an image
|
30 |
+
image = Image.open('your_image.jpg').convert('RGB')
|
31 |
+
inputs = processor(images=image, return_tensors="pt")
|
32 |
+
|
33 |
+
# Move inputs to CUDA
|
34 |
+
inputs = {k: v.to(device) for k, v in inputs.items()}
|
35 |
+
|
36 |
+
|
37 |
+
# Get embeddings
|
38 |
+
with torch.no_grad():
|
39 |
+
outputs = model(**inputs)
|
40 |
+
embeddings = outputs.last_hidden_state # Shape: [1, 256]
|
41 |
+
|
42 |
+
print("Generated dimensional embedding shape:", embeddings.shape[1])
|
43 |
+
```
|
44 |
+
|
45 |
+
### Preprocessing Pipeline
|
46 |
+
|
47 |
+
The model uses a specific preprocessing pipeline that's crucial for good performance:
|
48 |
+
|
49 |
+
1. **DownScale (Lanczos)**: Resize to max dimension of 332px
|
50 |
+
2. **JPEG Compression**: Apply quality=75 compression
|
51 |
+
3. **Scale Image**: Scale to max dimension of 332px
|
52 |
+
4. **Pad to Square**: Pad with color value 255
|
53 |
+
5. **Resize**: Resize to 224x224
|
54 |
+
6. **ToTensor**: Convert to PyTorch tensor
|
55 |
+
7. **Normalize**: ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
|
56 |
+
|
57 |
+
### Using with AutoModel and AutoImageProcessor
|
58 |
+
|
59 |
+
```python
|
60 |
+
from transformers import AutoModel, AutoImageProcessor
|
61 |
+
|
62 |
+
# Load from Hugging Face Hub
|
63 |
+
model = AutoModel.from_pretrained("Trendyol/trendyol-dino-v2-ecommerce-256d")
|
64 |
+
processor = AutoImageProcessor.from_pretrained("Trendyol/trendyol-dino-v2-ecommerce-256d")
|
65 |
+
|
66 |
+
# Full inference pipeline
|
67 |
+
import torch
|
68 |
+
from PIL import Image
|
69 |
+
|
70 |
+
image = Image.open('your_image.jpg')
|
71 |
+
inputs = processor(images=image, return_tensors="pt")
|
72 |
+
|
73 |
+
with torch.no_grad():
|
74 |
+
outputs = model(**inputs)
|
75 |
+
embeddings = outputs.last_hidden_state # Shape: [1, 256]
|
76 |
+
|
77 |
+
print("Embedding shape:", embeddings.shape)
|
78 |
+
```
|
79 |
+
|
80 |
+
## Installation
|
81 |
+
|
82 |
+
Install the required dependencies:
|
83 |
+
|
84 |
+
```bash
|
85 |
+
pip install transformers torch torchvision safetensors pillow numpy opencv-python
|
86 |
+
```
|
87 |
+
|
88 |
+
## Model Architecture
|
89 |
+
|
90 |
+
The model consists of:
|
91 |
+
- **Backbone**: DinoV2 ViT-B/14 (frozen during training)
|
92 |
+
- **Projection Head**: Linear layer mapping to 256 dimensions
|
93 |
+
- **Normalization**: L2 normalization for similarity computation
|
94 |
+
|
95 |
+
## Training Details
|
96 |
+
|
97 |
+
- **Loss Function**: ArcFace loss for metric learning
|
98 |
+
- **Training Data**: E-commerce product images
|
99 |
+
- **Epoch**: 9
|
100 |
+
- **PyTorch Version**: 2.8.0
|
101 |
+
|
102 |
+
## Intended Use
|
103 |
+
|
104 |
+
This model is designed for:
|
105 |
+
- Product image similarity search
|
106 |
+
- Visual product recommendations
|
107 |
+
- Duplicate product detection
|
108 |
+
- Content-based image retrieval in e-commerce
|
109 |
+
|
110 |
+
## Limitations
|
111 |
+
|
112 |
+
- Optimized specifically for product/e-commerce images
|
113 |
+
- May not generalize well to other image domains
|
114 |
+
- Requires specific preprocessing pipeline for optimal performance
|
115 |
+
- Requires transformers library for feature extractor functionality
|
116 |
+
|
117 |
+
## License
|
118 |
+
|
119 |
+
This model is released under the Apache 2.0 License. See LICENSE file for details.
|
120 |
+
|
121 |
+
## Citation
|
122 |
+
|
123 |
+
```
|
124 |
+
@misc{trendyol-dinov2-ecommerce,
|
125 |
+
title={Trendyol DinoV2 E-commerce Image Similarity Model},
|
126 |
+
author={Trendyol Machine Learning Team},
|
127 |
+
year={2025},
|
128 |
+
url={https://huggingface.co/Trendyol/trendyol-dino-v2-ecommerce-256d}
|
129 |
+
}
|
130 |
+
```
|
__init__.py
ADDED
@@ -0,0 +1,23 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
"""
|
2 |
+
Trendyol DinoV2 Image Similarity Model
|
3 |
+
|
4 |
+
This package contains a fine-tuned DinoV2 model for e-commerce image similarity.
|
5 |
+
Fully compatible with Hugging Face transformers.
|
6 |
+
"""
|
7 |
+
|
8 |
+
from .modeling_trendyol_dinov2 import TrendyolDinoV2Model, TrendyolDinoV2Config
|
9 |
+
from .image_processing_trendyol_dinov2 import TrendyolDinoV2ImageProcessor
|
10 |
+
|
11 |
+
# Register for AutoModel and AutoImageProcessor
|
12 |
+
from transformers import AutoConfig, AutoModel, AutoImageProcessor
|
13 |
+
|
14 |
+
AutoConfig.register("trendyol_dinov2", TrendyolDinoV2Config)
|
15 |
+
AutoModel.register(TrendyolDinoV2Config, TrendyolDinoV2Model)
|
16 |
+
AutoImageProcessor.register(TrendyolDinoV2Config, TrendyolDinoV2ImageProcessor)
|
17 |
+
|
18 |
+
__version__ = "1.0.0"
|
19 |
+
__all__ = [
|
20 |
+
"TrendyolDinoV2Model",
|
21 |
+
"TrendyolDinoV2Config",
|
22 |
+
"TrendyolDinoV2ImageProcessor"
|
23 |
+
]
|
__pycache__/modeling_trendyol_dinov2.cpython-312.pyc
ADDED
Binary file (7.02 kB). View file
|
|
config.json
ADDED
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_type": "trendyol_dinov2",
|
3 |
+
"architectures": [
|
4 |
+
"TrendyolDinoV2Model"
|
5 |
+
],
|
6 |
+
"auto_map": {
|
7 |
+
"AutoConfig": "modeling_trendyol_dinov2.TrendyolDinoV2Config",
|
8 |
+
"AutoModel": "modeling_trendyol_dinov2.TrendyolDinoV2Model",
|
9 |
+
"AutoImageProcessor": "image_processing_trendyol_dinov2.TrendyolDinoV2ImageProcessor"
|
10 |
+
},
|
11 |
+
"backbone_name": "dinov2_vitb14",
|
12 |
+
"embedding_dim": 256,
|
13 |
+
"hidden_size": 256,
|
14 |
+
"in_features": 768,
|
15 |
+
"use_arcface_loss": true,
|
16 |
+
"input_size": 224,
|
17 |
+
"downscale_size": 332,
|
18 |
+
"pad_color": 255,
|
19 |
+
"jpeg_quality": 75,
|
20 |
+
"normalization": {
|
21 |
+
"mean": [
|
22 |
+
0.485,
|
23 |
+
0.456,
|
24 |
+
0.406
|
25 |
+
],
|
26 |
+
"std": [
|
27 |
+
0.229,
|
28 |
+
0.224,
|
29 |
+
0.225
|
30 |
+
]
|
31 |
+
},
|
32 |
+
"preprocessing": {
|
33 |
+
"input_size": 224,
|
34 |
+
"downscale_size": 332,
|
35 |
+
"pad_color": 255,
|
36 |
+
"jpeg_quality": 75,
|
37 |
+
"transforms": [
|
38 |
+
"DownScaleLanczos",
|
39 |
+
"JPEGCompression",
|
40 |
+
"ScaleImage",
|
41 |
+
"PadToSquare",
|
42 |
+
"Resize",
|
43 |
+
"ToTensor",
|
44 |
+
"Normalize"
|
45 |
+
]
|
46 |
+
},
|
47 |
+
"task_type": "image-retrieval",
|
48 |
+
"training_info": {
|
49 |
+
"epoch": "9",
|
50 |
+
"torch_version": "2.8.0"
|
51 |
+
},
|
52 |
+
"torch_dtype": "float32",
|
53 |
+
"transformers_version": "4.20.0"
|
54 |
+
}
|
image_processing_trendyol_dinov2.py
ADDED
@@ -0,0 +1,163 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
"""
|
2 |
+
Hugging Face compatible image processor for Trendyol DinoV2
|
3 |
+
"""
|
4 |
+
from transformers import ImageProcessingMixin, BatchFeature
|
5 |
+
from transformers.utils import TensorType
|
6 |
+
from PIL import Image
|
7 |
+
import torch
|
8 |
+
import numpy as np
|
9 |
+
import cv2
|
10 |
+
from torchvision import transforms
|
11 |
+
import torchvision.transforms.functional as TF
|
12 |
+
from io import BytesIO
|
13 |
+
from typing import Union, List, Optional
|
14 |
+
|
15 |
+
|
16 |
+
def downscale_image(image: Image.Image, max_dimension: int) -> Image.Image:
|
17 |
+
"""Downscale image while maintaining aspect ratio"""
|
18 |
+
original_width, original_height = image.size
|
19 |
+
|
20 |
+
if max(original_width, original_height) <= max_dimension:
|
21 |
+
return image
|
22 |
+
|
23 |
+
aspect_ratio = original_width / original_height
|
24 |
+
|
25 |
+
if original_width > original_height:
|
26 |
+
new_width = max_dimension
|
27 |
+
new_height = int(max_dimension / aspect_ratio)
|
28 |
+
else:
|
29 |
+
new_height = max_dimension
|
30 |
+
new_width = int(max_dimension * aspect_ratio)
|
31 |
+
|
32 |
+
return image.resize((new_width, new_height), Image.LANCZOS)
|
33 |
+
|
34 |
+
|
35 |
+
class DownScaleLanczos:
|
36 |
+
def __init__(self, target_size=384):
|
37 |
+
self.target_size = target_size
|
38 |
+
|
39 |
+
def __call__(self, img):
|
40 |
+
return downscale_image(img, self.target_size)
|
41 |
+
|
42 |
+
|
43 |
+
class JPEGCompression:
|
44 |
+
def __init__(self, quality=75):
|
45 |
+
self.quality = quality
|
46 |
+
|
47 |
+
def __call__(self, img):
|
48 |
+
buffer = BytesIO()
|
49 |
+
img.save(buffer, format='JPEG', quality=self.quality)
|
50 |
+
buffer.seek(0)
|
51 |
+
return Image.open(buffer)
|
52 |
+
|
53 |
+
|
54 |
+
class ScaleImage:
|
55 |
+
def __init__(self, target_size):
|
56 |
+
self.target_size = target_size
|
57 |
+
|
58 |
+
def __call__(self, img):
|
59 |
+
w, h = img.size
|
60 |
+
max_size = max(h, w)
|
61 |
+
scale = self.target_size / max_size
|
62 |
+
new_size = int(w * scale), int(h * scale)
|
63 |
+
return img.resize(new_size, Image.BILINEAR)
|
64 |
+
|
65 |
+
|
66 |
+
class PadToSquare:
|
67 |
+
def __init__(self, color=255):
|
68 |
+
self.color = color
|
69 |
+
|
70 |
+
def __call__(self, img):
|
71 |
+
if isinstance(img, np.ndarray):
|
72 |
+
img = Image.fromarray(img)
|
73 |
+
|
74 |
+
width, height = img.size
|
75 |
+
if self.color != -1:
|
76 |
+
padding = abs(width - height) // 2
|
77 |
+
if width < height:
|
78 |
+
return TF.pad(img, (padding, 0, padding + (height - width) % 2, 0), fill=self.color, padding_mode='constant')
|
79 |
+
elif width > height:
|
80 |
+
return TF.pad(img, (0, padding, 0, padding + (width - height) % 2), fill=self.color, padding_mode='constant')
|
81 |
+
return img
|
82 |
+
|
83 |
+
|
84 |
+
class TrendyolDinoV2ImageProcessor(ImageProcessingMixin):
|
85 |
+
"""
|
86 |
+
Hugging Face compatible image processor for TrendyolDinoV2 model.
|
87 |
+
"""
|
88 |
+
|
89 |
+
model_input_names = ["pixel_values"]
|
90 |
+
|
91 |
+
def __init__(
|
92 |
+
self,
|
93 |
+
input_size=224,
|
94 |
+
downscale_size=332,
|
95 |
+
pad_color=255,
|
96 |
+
jpeg_quality=75,
|
97 |
+
do_normalize=True,
|
98 |
+
image_mean=(0.485, 0.456, 0.406),
|
99 |
+
image_std=(0.229, 0.224, 0.225),
|
100 |
+
**kwargs
|
101 |
+
):
|
102 |
+
super().__init__(**kwargs)
|
103 |
+
|
104 |
+
self.input_size = input_size
|
105 |
+
self.downscale_size = downscale_size
|
106 |
+
self.pad_color = pad_color
|
107 |
+
self.jpeg_quality = jpeg_quality
|
108 |
+
self.do_normalize = do_normalize
|
109 |
+
self.image_mean = image_mean
|
110 |
+
self.image_std = image_std
|
111 |
+
|
112 |
+
def _get_preprocess_fn(self):
|
113 |
+
"""Create the preprocessing pipeline (not stored as attribute to avoid JSON serialization issues)"""
|
114 |
+
return transforms.Compose([
|
115 |
+
DownScaleLanczos(self.downscale_size),
|
116 |
+
JPEGCompression(self.jpeg_quality),
|
117 |
+
ScaleImage(self.downscale_size),
|
118 |
+
PadToSquare(self.pad_color),
|
119 |
+
transforms.Resize((self.input_size, self.input_size)),
|
120 |
+
transforms.ToTensor(),
|
121 |
+
transforms.Normalize(self.image_mean, self.image_std)
|
122 |
+
])
|
123 |
+
|
124 |
+
def __call__(
|
125 |
+
self,
|
126 |
+
images: Union[Image.Image, np.ndarray, torch.Tensor, List[Image.Image], List[np.ndarray], List[torch.Tensor]],
|
127 |
+
return_tensors: Optional[Union[str, TensorType]] = None,
|
128 |
+
**kwargs
|
129 |
+
) -> BatchFeature:
|
130 |
+
"""
|
131 |
+
Preprocess images for the model.
|
132 |
+
"""
|
133 |
+
# Handle single image
|
134 |
+
if not isinstance(images, list):
|
135 |
+
images = [images]
|
136 |
+
|
137 |
+
# Get preprocessing pipeline
|
138 |
+
preprocess_fn = self._get_preprocess_fn()
|
139 |
+
|
140 |
+
# Preprocess all images
|
141 |
+
processed_images = []
|
142 |
+
for image in images:
|
143 |
+
if isinstance(image, str):
|
144 |
+
image = Image.open(image).convert('RGB')
|
145 |
+
elif isinstance(image, np.ndarray):
|
146 |
+
image = Image.fromarray(image).convert('RGB')
|
147 |
+
elif not isinstance(image, Image.Image):
|
148 |
+
raise ValueError(f"Unsupported image type: {type(image)}")
|
149 |
+
|
150 |
+
# Apply preprocessing
|
151 |
+
processed_tensor = preprocess_fn(image)
|
152 |
+
processed_images.append(processed_tensor)
|
153 |
+
|
154 |
+
# Stack tensors
|
155 |
+
pixel_values = torch.stack(processed_images)
|
156 |
+
|
157 |
+
# Return BatchFeature
|
158 |
+
data = {"pixel_values": pixel_values}
|
159 |
+
return BatchFeature(data=data, tensor_type=return_tensors)
|
160 |
+
|
161 |
+
|
162 |
+
# Register for auto class
|
163 |
+
TrendyolDinoV2ImageProcessor.register_for_auto_class("AutoImageProcessor")
|
model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:cb41c67595af4eb4ce357fbf55c7fc238436f0b24cc2b53a46f35f3cca0e0424
|
3 |
+
size 547685752
|
modeling_trendyol_dinov2.py
ADDED
@@ -0,0 +1,142 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
"""
|
2 |
+
Hugging Face compatible model implementation for Trendyol DinoV2
|
3 |
+
"""
|
4 |
+
import torch
|
5 |
+
import torch.nn as nn
|
6 |
+
from transformers import PreTrainedModel, PretrainedConfig
|
7 |
+
from transformers.modeling_outputs import BaseModelOutput
|
8 |
+
from typing import Optional, Tuple, Union
|
9 |
+
import torch.nn.functional as F
|
10 |
+
|
11 |
+
|
12 |
+
class TrendyolDinoV2Config(PretrainedConfig):
|
13 |
+
"""
|
14 |
+
Configuration class for TrendyolDinoV2 model.
|
15 |
+
"""
|
16 |
+
model_type = "trendyol_dinov2"
|
17 |
+
|
18 |
+
def __init__(
|
19 |
+
self,
|
20 |
+
embedding_dim=256,
|
21 |
+
input_size=224,
|
22 |
+
hidden_size=256,
|
23 |
+
backbone_name="dinov2_vitb14",
|
24 |
+
in_features=768,
|
25 |
+
downscale_size=332,
|
26 |
+
pad_color=255,
|
27 |
+
jpeg_quality=75,
|
28 |
+
**kwargs
|
29 |
+
):
|
30 |
+
super().__init__(**kwargs)
|
31 |
+
self.embedding_dim = embedding_dim
|
32 |
+
self.input_size = input_size
|
33 |
+
self.hidden_size = hidden_size
|
34 |
+
self.backbone_name = backbone_name
|
35 |
+
self.in_features = in_features
|
36 |
+
self.downscale_size = downscale_size
|
37 |
+
self.pad_color = pad_color
|
38 |
+
self.jpeg_quality = jpeg_quality
|
39 |
+
|
40 |
+
|
41 |
+
class TYArcFaceDinoV2(nn.Module):
|
42 |
+
"""Core model architecture"""
|
43 |
+
def __init__(self, config):
|
44 |
+
super(TYArcFaceDinoV2, self).__init__()
|
45 |
+
self.config = config
|
46 |
+
|
47 |
+
# Load DinoV2 backbone
|
48 |
+
try:
|
49 |
+
self.backbone = torch.hub.load('facebookresearch/dinov2', config.backbone_name)
|
50 |
+
except Exception as e:
|
51 |
+
raise RuntimeError(f"Failed to load DinoV2 backbone: {e}")
|
52 |
+
|
53 |
+
self.hidden_size = config.hidden_size
|
54 |
+
self.in_features = config.in_features
|
55 |
+
self.embedding_dim = config.embedding_dim
|
56 |
+
|
57 |
+
self.bn1 = nn.BatchNorm2d(self.in_features)
|
58 |
+
# Freeze backbone
|
59 |
+
self.backbone.requires_grad_(False)
|
60 |
+
|
61 |
+
# Projection layers
|
62 |
+
self.fc11 = nn.Linear(self.in_features * self.hidden_size, self.embedding_dim)
|
63 |
+
self.bn11 = nn.BatchNorm1d(self.embedding_dim)
|
64 |
+
|
65 |
+
def forward(self, pixel_values):
|
66 |
+
try:
|
67 |
+
features = self.backbone.get_intermediate_layers(
|
68 |
+
pixel_values, return_class_token=True, reshape=True
|
69 |
+
)
|
70 |
+
features = features[0][0] # Get the features
|
71 |
+
features = self.bn1(features)
|
72 |
+
features = features.flatten(start_dim=1)
|
73 |
+
features = self.fc11(features)
|
74 |
+
features = self.bn11(features)
|
75 |
+
features = F.normalize(features)
|
76 |
+
return features
|
77 |
+
except Exception as e:
|
78 |
+
raise RuntimeError(f"Forward pass failed: {e}")
|
79 |
+
|
80 |
+
|
81 |
+
class TrendyolDinoV2Model(PreTrainedModel):
|
82 |
+
"""
|
83 |
+
Hugging Face compatible wrapper for TrendyolDinoV2
|
84 |
+
"""
|
85 |
+
config_class = TrendyolDinoV2Config
|
86 |
+
base_model_prefix = "model"
|
87 |
+
|
88 |
+
def __init__(self, config):
|
89 |
+
super().__init__(config)
|
90 |
+
self.model = TYArcFaceDinoV2(config)
|
91 |
+
|
92 |
+
# Initialize weights
|
93 |
+
self.init_weights()
|
94 |
+
|
95 |
+
def _init_weights(self, module):
|
96 |
+
"""Initialize weights (required by PreTrainedModel)"""
|
97 |
+
if isinstance(module, nn.Linear):
|
98 |
+
module.weight.data.normal_(mean=0.0, std=0.02)
|
99 |
+
if module.bias is not None:
|
100 |
+
module.bias.data.zero_()
|
101 |
+
elif isinstance(module, nn.BatchNorm1d) or isinstance(module, nn.BatchNorm2d):
|
102 |
+
module.bias.data.zero_()
|
103 |
+
module.weight.data.fill_(1.0)
|
104 |
+
|
105 |
+
def init_weights(self):
|
106 |
+
"""Initialize all weights in the model"""
|
107 |
+
self.apply(self._init_weights)
|
108 |
+
|
109 |
+
def forward(
|
110 |
+
self,
|
111 |
+
pixel_values: Optional[torch.Tensor] = None,
|
112 |
+
output_hidden_states: Optional[bool] = None,
|
113 |
+
return_dict: Optional[bool] = None,
|
114 |
+
**kwargs
|
115 |
+
):
|
116 |
+
return_dict = return_dict if return_dict is not None else getattr(self.config, 'use_return_dict', True)
|
117 |
+
|
118 |
+
if pixel_values is None:
|
119 |
+
raise ValueError("pixel_values cannot be None")
|
120 |
+
|
121 |
+
# Get embeddings from the model
|
122 |
+
embeddings = self.model(pixel_values)
|
123 |
+
|
124 |
+
if not return_dict:
|
125 |
+
return (embeddings,)
|
126 |
+
|
127 |
+
return BaseModelOutput(
|
128 |
+
last_hidden_state=embeddings,
|
129 |
+
hidden_states=None,
|
130 |
+
attentions=None
|
131 |
+
)
|
132 |
+
|
133 |
+
def get_embeddings(self, pixel_values):
|
134 |
+
"""Convenience method to get embeddings directly"""
|
135 |
+
with torch.no_grad():
|
136 |
+
outputs = self.forward(pixel_values, return_dict=True)
|
137 |
+
return outputs.last_hidden_state
|
138 |
+
|
139 |
+
|
140 |
+
# Register the configuration
|
141 |
+
TrendyolDinoV2Config.register_for_auto_class()
|
142 |
+
TrendyolDinoV2Model.register_for_auto_class("AutoModel")
|
preprocessor_config.json
ADDED
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"image_processor_type": "TrendyolDinoV2ImageProcessor",
|
3 |
+
"processor_class": "TrendyolDinoV2ImageProcessor",
|
4 |
+
"auto_map": {
|
5 |
+
"AutoImageProcessor": "image_processing_trendyol_dinov2.TrendyolDinoV2ImageProcessor"
|
6 |
+
},
|
7 |
+
"input_size": 224,
|
8 |
+
"downscale_size": 332,
|
9 |
+
"pad_color": 255,
|
10 |
+
"jpeg_quality": 75,
|
11 |
+
"do_normalize": true,
|
12 |
+
"image_mean": [
|
13 |
+
0.485,
|
14 |
+
0.456,
|
15 |
+
0.406
|
16 |
+
],
|
17 |
+
"image_std": [
|
18 |
+
0.229,
|
19 |
+
0.224,
|
20 |
+
0.225
|
21 |
+
],
|
22 |
+
"do_resize": true,
|
23 |
+
"size": {
|
24 |
+
"height": 224,
|
25 |
+
"width": 224
|
26 |
+
},
|
27 |
+
"resample": 3,
|
28 |
+
"do_center_crop": false,
|
29 |
+
"crop_size": {
|
30 |
+
"height": 224,
|
31 |
+
"width": 224
|
32 |
+
},
|
33 |
+
"do_convert_rgb": true,
|
34 |
+
"transforms": [
|
35 |
+
"DownScaleLanczos",
|
36 |
+
"JPEGCompression",
|
37 |
+
"ScaleImage",
|
38 |
+
"PadToSquare",
|
39 |
+
"Resize",
|
40 |
+
"ToTensor",
|
41 |
+
"Normalize"
|
42 |
+
]
|
43 |
+
}
|
pytorch_model.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:60a38364dc18e4dd31a5bda0e8c36223a9b3518112ceeee7650ef59fd072a6cd
|
3 |
+
size 547728271
|
requirements.txt
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
torch>=1.9.0
|
2 |
+
torchvision>=0.10.0
|
3 |
+
safetensors>=0.3.0
|
4 |
+
Pillow>=8.0.0
|
5 |
+
numpy>=1.20.0
|
6 |
+
opencv-python>=4.5.0
|
7 |
+
transformers>=4.20.0
|