yusufcakmak commited on
Commit
a15fec5
·
verified ·
1 Parent(s): b22be58

feat: test upload - Trendyol DinoV2 Product Similarity and Retrieval Embedding Model

Browse files

🧪 Test Upload Details:
- Personal account testing before company publication
- Architecture: ConvNeXt-Base + ArcFace loss
- Embedding dimension: 256
- Task: Product similarity and retrieval

📁 Repository Contents:
- Model weights in safetensors format
- Complete model card with usage examples
- Apache 2.0 license
- Demo notebook for inference

🔒 Security: Scanned and validated
📋 RFC Compliance: Ready for company publication

Test upload by: Personal Account

LICENSE ADDED
@@ -0,0 +1,189 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity granting the License.
13
+
14
+ "Legal Entity" shall mean the union of the acting entity and all
15
+ other entities that control, are controlled by, or are under common
16
+ control with that entity. For the purposes of the definition of
17
+ "control", an entity controls another entity when such entity:
18
+ (i) has the power, direct or indirect, to cause the direction or
19
+ management of such other entity, whether by contract or otherwise,
20
+ (ii) owns fifty percent (50%) or more of the outstanding shares, or
21
+ (iii) has beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (which shall not include communication that is conspicuously
39
+ marked or otherwise designated in writing by the copyright owner
40
+ as "Not a Contribution").
41
+
42
+ "Derivative Works" shall mean any work, whether in Source or Object
43
+ form, that is based upon (or derived from) the Work and for which the
44
+ editorial revisions, annotations, elaborations, or other modifications
45
+ represent, as a whole, an original work of authorship. For the purposes
46
+ of this License, Derivative Works shall not include works that remain
47
+ separable from, or merely link (or bind by name) to the interfaces of,
48
+ the Work and derivative works thereof.
49
+
50
+ "Contribution" shall mean any work of authorship, including
51
+ the original version of the Work and any modifications or additions
52
+ to that Work or Derivative Works thereof, that is intentionally
53
+ submitted to Licensor for inclusion in the Work by the copyright owner
54
+ or by an individual or Legal Entity authorized to submit on behalf of
55
+ the copyright owner. For the purposes of the definition of "Contribution",
56
+ any such Contribution intentionally submitted for inclusion in the Work
57
+ by You to the Licensor shall be deemed to have been made under the
58
+ terms and conditions of this License, without any additional terms or
59
+ conditions. Notwithstanding the above, nothing herein shall supersede or
60
+ modify the terms of any separate license agreement you may have executed
61
+ with Licensor regarding such Contributions.
62
+
63
+ 2. Grant of Copyright License. Subject to the terms and conditions of
64
+ this License, each Contributor hereby grants to You a perpetual,
65
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
66
+ copyright license to use, reproduce, modify, merge, publish,
67
+ distribute, sublicense, and/or sell copies of the Work, and to
68
+ permit persons to whom the Work is furnished to do so, subject to
69
+ the following conditions:
70
+
71
+ The above copyright notice and this permission notice shall be
72
+ included in all copies or substantial portions of the Work.
73
+
74
+ 3. Grant of Patent License. Subject to the terms and conditions of
75
+ this License, each Contributor hereby grants to You a perpetual,
76
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77
+ (except as stated in this section) patent license to make, have made,
78
+ use, offer to sell, sell, import, and otherwise transfer the Work,
79
+ where such license applies only to those patent claims licensable
80
+ by such Contributor that are necessarily infringed by their
81
+ Contribution(s) alone or by combination of their Contribution(s)
82
+ with the Work to which such Contribution(s) was submitted. If You
83
+ institute patent litigation against any entity (including a
84
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
85
+ or a Contribution incorporated within the Work constitutes direct
86
+ or contributory patent infringement, then any patent licenses
87
+ granted to You under this License for that Work shall terminate
88
+ as of the date such litigation is filed.
89
+
90
+ 4. Redistribution. You may reproduce and distribute copies of the
91
+ Work or Derivative Works thereof in any medium, with or without
92
+ modifications, and in Source or Object form, provided that You
93
+ meet the following conditions:
94
+
95
+ (a) You must give any other recipients of the Work or
96
+ Derivative Works a copy of this License; and
97
+
98
+ (b) You must cause any modified files to carry prominent notices
99
+ stating that You changed the files; and
100
+
101
+ (c) You must retain, in the Source form of any Derivative Works
102
+ that You distribute, all copyright, trademark, patent,
103
+ and attribution notices from the Source form of the Work,
104
+ excluding those notices that do not pertain to any part of
105
+ the Derivative Works; and
106
+
107
+ (d) If the Work includes a "NOTICE" text file as part of its
108
+ distribution, then any Derivative Works that You distribute must
109
+ include a readable copy of the attribution notices contained
110
+ within such NOTICE file, excluding those notices that do not
111
+ pertain to any part of the Derivative Works, in at least one
112
+ of the following places: within a NOTICE text file distributed
113
+ as part of the Derivative Works; within the Source form or
114
+ documentation, if provided along with the Derivative Works; or,
115
+ within a display generated by the Derivative Works, if and
116
+ wherever such third-party notices normally appear. The contents
117
+ of the NOTICE file are for informational purposes only and
118
+ do not modify the License. You may add Your own attribution
119
+ notices within Derivative Works that You distribute, alongside
120
+ or as an addendum to the NOTICE text from the Work, provided
121
+ that such additional attribution notices cannot be construed
122
+ as modifying the License.
123
+
124
+ You may add Your own copyright notice to Your modifications and
125
+ may provide additional or different license terms and conditions
126
+ for use, reproduction, or distribution of Your modifications, or
127
+ for any such Derivative Works as a whole, provided Your use,
128
+ reproduction, and distribution of the Work otherwise complies with
129
+ the conditions stated in this License.
130
+
131
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
132
+ any Contribution intentionally submitted for inclusion in the Work
133
+ by You to the Licensor shall be under the terms and conditions of
134
+ this License, without any additional terms or conditions.
135
+ Notwithstanding the above, nothing herein shall supersede or modify
136
+ the terms of any separate license agreement you may have executed
137
+ with Licensor regarding such Contributions.
138
+
139
+ 6. Trademarks. This License does not grant permission to use the trade
140
+ names, trademarks, service marks, or product names of the Licensor,
141
+ except as required for reasonable and customary use in describing the
142
+ origin of the Work and reproducing the content of the NOTICE file.
143
+
144
+ 7. Disclaimer of Warranty. Unless required by applicable law or
145
+ agreed to in writing, Licensor provides the Work (and each
146
+ Contributor provides its Contributions) on an "AS IS" BASIS,
147
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148
+ implied, including, without limitation, any warranties or conditions
149
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150
+ PARTICULAR PURPOSE. You are solely responsible for determining the
151
+ appropriateness of using or redistributing the Work and assume any
152
+ risks associated with Your exercise of permissions under this License.
153
+
154
+ 8. Limitation of Liability. In no event and under no legal theory,
155
+ whether in tort (including negligence), contract, or otherwise,
156
+ unless required by applicable law (such as deliberate and grossly
157
+ negligent acts) or agreed to in writing, shall any Contributor be
158
+ liable to You for damages, including any direct, indirect, special,
159
+ incidental, or consequential damages of any character arising as a
160
+ result of this License or out of the use or inability to use the
161
+ Work (including but not limited to damages for loss of goodwill,
162
+ work stoppage, computer failure or malfunction, or any and all
163
+ other commercial damages or losses), even if such Contributor
164
+ has been advised of the possibility of such damages.
165
+
166
+ 9. Accepting Warranty or Support. You are not required to accept
167
+ warranty or support for the Work under this License. However, if You
168
+ choose to accept warranty or support, You may act only on Your own
169
+ behalf and on Your sole responsibility, not on behalf of any other
170
+ Contributor, and only if You agree to indemnify, defend, and hold each
171
+ Contributor harmless for any liability incurred by, or claims asserted
172
+ against, such Contributor by reason of your accepting any such warranty
173
+ or support.
174
+
175
+ END OF TERMS AND CONDITIONS
176
+
177
+ Copyright 2025 Trendyol
178
+
179
+ Licensed under the Apache License, Version 2.0 (the "License");
180
+ you may not use this file except in compliance with the License.
181
+ You may obtain a copy of the License at
182
+
183
+ http://www.apache.org/licenses/LICENSE-2.0
184
+
185
+ Unless required by applicable law or agreed to in writing, software
186
+ distributed under the License is distributed on an "AS IS" BASIS,
187
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
188
+ See the License for the specific language governing permissions and
189
+ limitations under the License.
README.md ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Trendyol DinoV2 Image Similarity Model
2
+
3
+ This repository contains a fine-tuned DinoV2 model for image similarity and retrieval tasks, specifically trained on e-commerce product images.
4
+
5
+ ## Model Details
6
+
7
+ - **Model Type**: Image Similarity/Retrieval
8
+ - **Architecture**: DinoV2 ViT-B/14 with ArcFace loss
9
+ - **Embedding Dimension**: 256
10
+ - **Input Size**: 224x224
11
+ - **Framework**: PyTorch
12
+ - **Format**: SafeTensors
13
+
14
+ ## Usage
15
+
16
+ ### Quick Start
17
+
18
+ ```python
19
+ import torch
20
+ from PIL import Image
21
+ from transformers import AutoModel, AutoImageProcessor
22
+
23
+ device = 'cuda'
24
+
25
+ # Load model and processor from Hugging Face Hub
26
+ model = AutoModel.from_pretrained("Trendyol/trendyol-dino-v2-ecommerce-256d", trust_remote_code=True)
27
+ processor = AutoImageProcessor.from_pretrained("Trendyol/trendyol-dino-v2-ecommerce-256d", trust_remote_code=True)
28
+
29
+ # Load and process an image
30
+ image = Image.open('your_image.jpg').convert('RGB')
31
+ inputs = processor(images=image, return_tensors="pt")
32
+
33
+ # Move inputs to CUDA
34
+ inputs = {k: v.to(device) for k, v in inputs.items()}
35
+
36
+
37
+ # Get embeddings
38
+ with torch.no_grad():
39
+ outputs = model(**inputs)
40
+ embeddings = outputs.last_hidden_state # Shape: [1, 256]
41
+
42
+ print("Generated dimensional embedding shape:", embeddings.shape[1])
43
+ ```
44
+
45
+ ### Preprocessing Pipeline
46
+
47
+ The model uses a specific preprocessing pipeline that's crucial for good performance:
48
+
49
+ 1. **DownScale (Lanczos)**: Resize to max dimension of 332px
50
+ 2. **JPEG Compression**: Apply quality=75 compression
51
+ 3. **Scale Image**: Scale to max dimension of 332px
52
+ 4. **Pad to Square**: Pad with color value 255
53
+ 5. **Resize**: Resize to 224x224
54
+ 6. **ToTensor**: Convert to PyTorch tensor
55
+ 7. **Normalize**: ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
56
+
57
+ ### Using with AutoModel and AutoImageProcessor
58
+
59
+ ```python
60
+ from transformers import AutoModel, AutoImageProcessor
61
+
62
+ # Load from Hugging Face Hub
63
+ model = AutoModel.from_pretrained("Trendyol/trendyol-dino-v2-ecommerce-256d")
64
+ processor = AutoImageProcessor.from_pretrained("Trendyol/trendyol-dino-v2-ecommerce-256d")
65
+
66
+ # Full inference pipeline
67
+ import torch
68
+ from PIL import Image
69
+
70
+ image = Image.open('your_image.jpg')
71
+ inputs = processor(images=image, return_tensors="pt")
72
+
73
+ with torch.no_grad():
74
+ outputs = model(**inputs)
75
+ embeddings = outputs.last_hidden_state # Shape: [1, 256]
76
+
77
+ print("Embedding shape:", embeddings.shape)
78
+ ```
79
+
80
+ ## Installation
81
+
82
+ Install the required dependencies:
83
+
84
+ ```bash
85
+ pip install transformers torch torchvision safetensors pillow numpy opencv-python
86
+ ```
87
+
88
+ ## Model Architecture
89
+
90
+ The model consists of:
91
+ - **Backbone**: DinoV2 ViT-B/14 (frozen during training)
92
+ - **Projection Head**: Linear layer mapping to 256 dimensions
93
+ - **Normalization**: L2 normalization for similarity computation
94
+
95
+ ## Training Details
96
+
97
+ - **Loss Function**: ArcFace loss for metric learning
98
+ - **Training Data**: E-commerce product images
99
+ - **Epoch**: 9
100
+ - **PyTorch Version**: 2.8.0
101
+
102
+ ## Intended Use
103
+
104
+ This model is designed for:
105
+ - Product image similarity search
106
+ - Visual product recommendations
107
+ - Duplicate product detection
108
+ - Content-based image retrieval in e-commerce
109
+
110
+ ## Limitations
111
+
112
+ - Optimized specifically for product/e-commerce images
113
+ - May not generalize well to other image domains
114
+ - Requires specific preprocessing pipeline for optimal performance
115
+ - Requires transformers library for feature extractor functionality
116
+
117
+ ## License
118
+
119
+ This model is released under the Apache 2.0 License. See LICENSE file for details.
120
+
121
+ ## Citation
122
+
123
+ ```
124
+ @misc{trendyol-dinov2-ecommerce,
125
+ title={Trendyol DinoV2 E-commerce Image Similarity Model},
126
+ author={Trendyol Machine Learning Team},
127
+ year={2025},
128
+ url={https://huggingface.co/Trendyol/trendyol-dino-v2-ecommerce-256d}
129
+ }
130
+ ```
__init__.py ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Trendyol DinoV2 Image Similarity Model
3
+
4
+ This package contains a fine-tuned DinoV2 model for e-commerce image similarity.
5
+ Fully compatible with Hugging Face transformers.
6
+ """
7
+
8
+ from .modeling_trendyol_dinov2 import TrendyolDinoV2Model, TrendyolDinoV2Config
9
+ from .image_processing_trendyol_dinov2 import TrendyolDinoV2ImageProcessor
10
+
11
+ # Register for AutoModel and AutoImageProcessor
12
+ from transformers import AutoConfig, AutoModel, AutoImageProcessor
13
+
14
+ AutoConfig.register("trendyol_dinov2", TrendyolDinoV2Config)
15
+ AutoModel.register(TrendyolDinoV2Config, TrendyolDinoV2Model)
16
+ AutoImageProcessor.register(TrendyolDinoV2Config, TrendyolDinoV2ImageProcessor)
17
+
18
+ __version__ = "1.0.0"
19
+ __all__ = [
20
+ "TrendyolDinoV2Model",
21
+ "TrendyolDinoV2Config",
22
+ "TrendyolDinoV2ImageProcessor"
23
+ ]
__pycache__/modeling_trendyol_dinov2.cpython-312.pyc ADDED
Binary file (7.02 kB). View file
 
config.json ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "trendyol_dinov2",
3
+ "architectures": [
4
+ "TrendyolDinoV2Model"
5
+ ],
6
+ "auto_map": {
7
+ "AutoConfig": "modeling_trendyol_dinov2.TrendyolDinoV2Config",
8
+ "AutoModel": "modeling_trendyol_dinov2.TrendyolDinoV2Model",
9
+ "AutoImageProcessor": "image_processing_trendyol_dinov2.TrendyolDinoV2ImageProcessor"
10
+ },
11
+ "backbone_name": "dinov2_vitb14",
12
+ "embedding_dim": 256,
13
+ "hidden_size": 256,
14
+ "in_features": 768,
15
+ "use_arcface_loss": true,
16
+ "input_size": 224,
17
+ "downscale_size": 332,
18
+ "pad_color": 255,
19
+ "jpeg_quality": 75,
20
+ "normalization": {
21
+ "mean": [
22
+ 0.485,
23
+ 0.456,
24
+ 0.406
25
+ ],
26
+ "std": [
27
+ 0.229,
28
+ 0.224,
29
+ 0.225
30
+ ]
31
+ },
32
+ "preprocessing": {
33
+ "input_size": 224,
34
+ "downscale_size": 332,
35
+ "pad_color": 255,
36
+ "jpeg_quality": 75,
37
+ "transforms": [
38
+ "DownScaleLanczos",
39
+ "JPEGCompression",
40
+ "ScaleImage",
41
+ "PadToSquare",
42
+ "Resize",
43
+ "ToTensor",
44
+ "Normalize"
45
+ ]
46
+ },
47
+ "task_type": "image-retrieval",
48
+ "training_info": {
49
+ "epoch": "9",
50
+ "torch_version": "2.8.0"
51
+ },
52
+ "torch_dtype": "float32",
53
+ "transformers_version": "4.20.0"
54
+ }
image_processing_trendyol_dinov2.py ADDED
@@ -0,0 +1,163 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Hugging Face compatible image processor for Trendyol DinoV2
3
+ """
4
+ from transformers import ImageProcessingMixin, BatchFeature
5
+ from transformers.utils import TensorType
6
+ from PIL import Image
7
+ import torch
8
+ import numpy as np
9
+ import cv2
10
+ from torchvision import transforms
11
+ import torchvision.transforms.functional as TF
12
+ from io import BytesIO
13
+ from typing import Union, List, Optional
14
+
15
+
16
+ def downscale_image(image: Image.Image, max_dimension: int) -> Image.Image:
17
+ """Downscale image while maintaining aspect ratio"""
18
+ original_width, original_height = image.size
19
+
20
+ if max(original_width, original_height) <= max_dimension:
21
+ return image
22
+
23
+ aspect_ratio = original_width / original_height
24
+
25
+ if original_width > original_height:
26
+ new_width = max_dimension
27
+ new_height = int(max_dimension / aspect_ratio)
28
+ else:
29
+ new_height = max_dimension
30
+ new_width = int(max_dimension * aspect_ratio)
31
+
32
+ return image.resize((new_width, new_height), Image.LANCZOS)
33
+
34
+
35
+ class DownScaleLanczos:
36
+ def __init__(self, target_size=384):
37
+ self.target_size = target_size
38
+
39
+ def __call__(self, img):
40
+ return downscale_image(img, self.target_size)
41
+
42
+
43
+ class JPEGCompression:
44
+ def __init__(self, quality=75):
45
+ self.quality = quality
46
+
47
+ def __call__(self, img):
48
+ buffer = BytesIO()
49
+ img.save(buffer, format='JPEG', quality=self.quality)
50
+ buffer.seek(0)
51
+ return Image.open(buffer)
52
+
53
+
54
+ class ScaleImage:
55
+ def __init__(self, target_size):
56
+ self.target_size = target_size
57
+
58
+ def __call__(self, img):
59
+ w, h = img.size
60
+ max_size = max(h, w)
61
+ scale = self.target_size / max_size
62
+ new_size = int(w * scale), int(h * scale)
63
+ return img.resize(new_size, Image.BILINEAR)
64
+
65
+
66
+ class PadToSquare:
67
+ def __init__(self, color=255):
68
+ self.color = color
69
+
70
+ def __call__(self, img):
71
+ if isinstance(img, np.ndarray):
72
+ img = Image.fromarray(img)
73
+
74
+ width, height = img.size
75
+ if self.color != -1:
76
+ padding = abs(width - height) // 2
77
+ if width < height:
78
+ return TF.pad(img, (padding, 0, padding + (height - width) % 2, 0), fill=self.color, padding_mode='constant')
79
+ elif width > height:
80
+ return TF.pad(img, (0, padding, 0, padding + (width - height) % 2), fill=self.color, padding_mode='constant')
81
+ return img
82
+
83
+
84
+ class TrendyolDinoV2ImageProcessor(ImageProcessingMixin):
85
+ """
86
+ Hugging Face compatible image processor for TrendyolDinoV2 model.
87
+ """
88
+
89
+ model_input_names = ["pixel_values"]
90
+
91
+ def __init__(
92
+ self,
93
+ input_size=224,
94
+ downscale_size=332,
95
+ pad_color=255,
96
+ jpeg_quality=75,
97
+ do_normalize=True,
98
+ image_mean=(0.485, 0.456, 0.406),
99
+ image_std=(0.229, 0.224, 0.225),
100
+ **kwargs
101
+ ):
102
+ super().__init__(**kwargs)
103
+
104
+ self.input_size = input_size
105
+ self.downscale_size = downscale_size
106
+ self.pad_color = pad_color
107
+ self.jpeg_quality = jpeg_quality
108
+ self.do_normalize = do_normalize
109
+ self.image_mean = image_mean
110
+ self.image_std = image_std
111
+
112
+ def _get_preprocess_fn(self):
113
+ """Create the preprocessing pipeline (not stored as attribute to avoid JSON serialization issues)"""
114
+ return transforms.Compose([
115
+ DownScaleLanczos(self.downscale_size),
116
+ JPEGCompression(self.jpeg_quality),
117
+ ScaleImage(self.downscale_size),
118
+ PadToSquare(self.pad_color),
119
+ transforms.Resize((self.input_size, self.input_size)),
120
+ transforms.ToTensor(),
121
+ transforms.Normalize(self.image_mean, self.image_std)
122
+ ])
123
+
124
+ def __call__(
125
+ self,
126
+ images: Union[Image.Image, np.ndarray, torch.Tensor, List[Image.Image], List[np.ndarray], List[torch.Tensor]],
127
+ return_tensors: Optional[Union[str, TensorType]] = None,
128
+ **kwargs
129
+ ) -> BatchFeature:
130
+ """
131
+ Preprocess images for the model.
132
+ """
133
+ # Handle single image
134
+ if not isinstance(images, list):
135
+ images = [images]
136
+
137
+ # Get preprocessing pipeline
138
+ preprocess_fn = self._get_preprocess_fn()
139
+
140
+ # Preprocess all images
141
+ processed_images = []
142
+ for image in images:
143
+ if isinstance(image, str):
144
+ image = Image.open(image).convert('RGB')
145
+ elif isinstance(image, np.ndarray):
146
+ image = Image.fromarray(image).convert('RGB')
147
+ elif not isinstance(image, Image.Image):
148
+ raise ValueError(f"Unsupported image type: {type(image)}")
149
+
150
+ # Apply preprocessing
151
+ processed_tensor = preprocess_fn(image)
152
+ processed_images.append(processed_tensor)
153
+
154
+ # Stack tensors
155
+ pixel_values = torch.stack(processed_images)
156
+
157
+ # Return BatchFeature
158
+ data = {"pixel_values": pixel_values}
159
+ return BatchFeature(data=data, tensor_type=return_tensors)
160
+
161
+
162
+ # Register for auto class
163
+ TrendyolDinoV2ImageProcessor.register_for_auto_class("AutoImageProcessor")
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cb41c67595af4eb4ce357fbf55c7fc238436f0b24cc2b53a46f35f3cca0e0424
3
+ size 547685752
modeling_trendyol_dinov2.py ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Hugging Face compatible model implementation for Trendyol DinoV2
3
+ """
4
+ import torch
5
+ import torch.nn as nn
6
+ from transformers import PreTrainedModel, PretrainedConfig
7
+ from transformers.modeling_outputs import BaseModelOutput
8
+ from typing import Optional, Tuple, Union
9
+ import torch.nn.functional as F
10
+
11
+
12
+ class TrendyolDinoV2Config(PretrainedConfig):
13
+ """
14
+ Configuration class for TrendyolDinoV2 model.
15
+ """
16
+ model_type = "trendyol_dinov2"
17
+
18
+ def __init__(
19
+ self,
20
+ embedding_dim=256,
21
+ input_size=224,
22
+ hidden_size=256,
23
+ backbone_name="dinov2_vitb14",
24
+ in_features=768,
25
+ downscale_size=332,
26
+ pad_color=255,
27
+ jpeg_quality=75,
28
+ **kwargs
29
+ ):
30
+ super().__init__(**kwargs)
31
+ self.embedding_dim = embedding_dim
32
+ self.input_size = input_size
33
+ self.hidden_size = hidden_size
34
+ self.backbone_name = backbone_name
35
+ self.in_features = in_features
36
+ self.downscale_size = downscale_size
37
+ self.pad_color = pad_color
38
+ self.jpeg_quality = jpeg_quality
39
+
40
+
41
+ class TYArcFaceDinoV2(nn.Module):
42
+ """Core model architecture"""
43
+ def __init__(self, config):
44
+ super(TYArcFaceDinoV2, self).__init__()
45
+ self.config = config
46
+
47
+ # Load DinoV2 backbone
48
+ try:
49
+ self.backbone = torch.hub.load('facebookresearch/dinov2', config.backbone_name)
50
+ except Exception as e:
51
+ raise RuntimeError(f"Failed to load DinoV2 backbone: {e}")
52
+
53
+ self.hidden_size = config.hidden_size
54
+ self.in_features = config.in_features
55
+ self.embedding_dim = config.embedding_dim
56
+
57
+ self.bn1 = nn.BatchNorm2d(self.in_features)
58
+ # Freeze backbone
59
+ self.backbone.requires_grad_(False)
60
+
61
+ # Projection layers
62
+ self.fc11 = nn.Linear(self.in_features * self.hidden_size, self.embedding_dim)
63
+ self.bn11 = nn.BatchNorm1d(self.embedding_dim)
64
+
65
+ def forward(self, pixel_values):
66
+ try:
67
+ features = self.backbone.get_intermediate_layers(
68
+ pixel_values, return_class_token=True, reshape=True
69
+ )
70
+ features = features[0][0] # Get the features
71
+ features = self.bn1(features)
72
+ features = features.flatten(start_dim=1)
73
+ features = self.fc11(features)
74
+ features = self.bn11(features)
75
+ features = F.normalize(features)
76
+ return features
77
+ except Exception as e:
78
+ raise RuntimeError(f"Forward pass failed: {e}")
79
+
80
+
81
+ class TrendyolDinoV2Model(PreTrainedModel):
82
+ """
83
+ Hugging Face compatible wrapper for TrendyolDinoV2
84
+ """
85
+ config_class = TrendyolDinoV2Config
86
+ base_model_prefix = "model"
87
+
88
+ def __init__(self, config):
89
+ super().__init__(config)
90
+ self.model = TYArcFaceDinoV2(config)
91
+
92
+ # Initialize weights
93
+ self.init_weights()
94
+
95
+ def _init_weights(self, module):
96
+ """Initialize weights (required by PreTrainedModel)"""
97
+ if isinstance(module, nn.Linear):
98
+ module.weight.data.normal_(mean=0.0, std=0.02)
99
+ if module.bias is not None:
100
+ module.bias.data.zero_()
101
+ elif isinstance(module, nn.BatchNorm1d) or isinstance(module, nn.BatchNorm2d):
102
+ module.bias.data.zero_()
103
+ module.weight.data.fill_(1.0)
104
+
105
+ def init_weights(self):
106
+ """Initialize all weights in the model"""
107
+ self.apply(self._init_weights)
108
+
109
+ def forward(
110
+ self,
111
+ pixel_values: Optional[torch.Tensor] = None,
112
+ output_hidden_states: Optional[bool] = None,
113
+ return_dict: Optional[bool] = None,
114
+ **kwargs
115
+ ):
116
+ return_dict = return_dict if return_dict is not None else getattr(self.config, 'use_return_dict', True)
117
+
118
+ if pixel_values is None:
119
+ raise ValueError("pixel_values cannot be None")
120
+
121
+ # Get embeddings from the model
122
+ embeddings = self.model(pixel_values)
123
+
124
+ if not return_dict:
125
+ return (embeddings,)
126
+
127
+ return BaseModelOutput(
128
+ last_hidden_state=embeddings,
129
+ hidden_states=None,
130
+ attentions=None
131
+ )
132
+
133
+ def get_embeddings(self, pixel_values):
134
+ """Convenience method to get embeddings directly"""
135
+ with torch.no_grad():
136
+ outputs = self.forward(pixel_values, return_dict=True)
137
+ return outputs.last_hidden_state
138
+
139
+
140
+ # Register the configuration
141
+ TrendyolDinoV2Config.register_for_auto_class()
142
+ TrendyolDinoV2Model.register_for_auto_class("AutoModel")
preprocessor_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "image_processor_type": "TrendyolDinoV2ImageProcessor",
3
+ "processor_class": "TrendyolDinoV2ImageProcessor",
4
+ "auto_map": {
5
+ "AutoImageProcessor": "image_processing_trendyol_dinov2.TrendyolDinoV2ImageProcessor"
6
+ },
7
+ "input_size": 224,
8
+ "downscale_size": 332,
9
+ "pad_color": 255,
10
+ "jpeg_quality": 75,
11
+ "do_normalize": true,
12
+ "image_mean": [
13
+ 0.485,
14
+ 0.456,
15
+ 0.406
16
+ ],
17
+ "image_std": [
18
+ 0.229,
19
+ 0.224,
20
+ 0.225
21
+ ],
22
+ "do_resize": true,
23
+ "size": {
24
+ "height": 224,
25
+ "width": 224
26
+ },
27
+ "resample": 3,
28
+ "do_center_crop": false,
29
+ "crop_size": {
30
+ "height": 224,
31
+ "width": 224
32
+ },
33
+ "do_convert_rgb": true,
34
+ "transforms": [
35
+ "DownScaleLanczos",
36
+ "JPEGCompression",
37
+ "ScaleImage",
38
+ "PadToSquare",
39
+ "Resize",
40
+ "ToTensor",
41
+ "Normalize"
42
+ ]
43
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:60a38364dc18e4dd31a5bda0e8c36223a9b3518112ceeee7650ef59fd072a6cd
3
+ size 547728271
requirements.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ torch>=1.9.0
2
+ torchvision>=0.10.0
3
+ safetensors>=0.3.0
4
+ Pillow>=8.0.0
5
+ numpy>=1.20.0
6
+ opencv-python>=4.5.0
7
+ transformers>=4.20.0