--- license: apache-2.0 datasets: - Bruece/domainnet-126-by-class-sketch language: - en base_model: - google/siglip2-base-patch16-224 pipeline_tag: image-classification library_name: transformers tags: - Sketch-126-DomainNet --- ![fdhsdftghd.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/iS8BrTcPZ38592IP_NW3z.png) # **Sketch-126-DomainNet** > **Sketch-126-DomainNet** is an image classification vision-language encoder model fine-tuned from **google/siglip2-base-patch16-224** for a single-label classification task. It is designed to classify sketches into 126 domain categories using the **SiglipForImageClassification** architecture. ![Sketch-126-DomainNet - visual selection.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/Rc6Q9-9_nSTV2mRicSqj1.png) *Moment Matching for Multi-Source Domain Adaptation* : https://arxiv.org/pdf/1812.01754 *SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features* https://arxiv.org/pdf/2502.14786 ```py Classification Report: precision recall f1-score support aircraft_carrier 1.0000 0.2200 0.3607 50 alarm_clock 0.9873 0.9568 0.9718 162 ant 0.9432 0.9326 0.9379 89 anvil 0.2727 0.0423 0.0732 71 asparagus 0.9673 0.8916 0.9279 166 axe 0.8034 0.8773 0.8387 163 banana 0.9744 0.9383 0.9560 162 basket 0.7160 0.7682 0.7412 151 bathtub 0.8073 0.9281 0.8635 167 bear 0.8636 0.6690 0.7540 142 bee 0.9196 0.8957 0.9075 115 bird 0.9094 0.9429 0.9259 245 blackberry 1.0000 0.1250 0.2222 48 blueberry 0.6744 0.8529 0.7532 102 bottlecap 0.7468 0.5315 0.6211 111 broccoli 0.7727 0.9444 0.8500 144 bus 0.9302 0.8989 0.9143 178 butterfly 0.9594 0.9497 0.9545 199 cactus 1.0000 0.6735 0.8049 49 cake 0.0000 0.0000 0.0000 54 calculator 0.9298 0.9636 0.9464 55 camel 0.9208 0.8942 0.9073 104 camera 0.9200 0.7931 0.8519 87 candle 0.9556 0.6935 0.8037 62 cannon 0.7500 0.2027 0.3191 74 canoe 0.8000 0.5825 0.6742 103 carrot 0.0000 0.0000 0.0000 27 castle 0.9583 0.5111 0.6667 45 cat 0.8961 0.6635 0.7624 104 ceiling_fan 0.0000 0.0000 0.0000 20 cell_phone 0.0000 0.0000 0.0000 18 cello 0.9600 0.4706 0.6316 51 chair 0.8043 0.4805 0.6016 77 chandelier 0.0000 0.0000 0.0000 27 coffee_cup 0.0000 0.0000 0.0000 26 compass 0.0000 0.0000 0.0000 10 computer 0.2500 0.0435 0.0741 23 cow 0.0000 0.0000 0.0000 14 crab 0.9123 0.8525 0.8814 122 crocodile 0.9280 0.8992 0.9134 129 cruise_ship 0.7467 0.9032 0.8175 124 dog 0.8533 0.8911 0.8718 248 dolphin 0.9091 0.8824 0.8955 68 dragon 0.7914 0.8269 0.8088 156 drums 0.9259 0.8772 0.9009 171 duck 0.8409 0.8409 0.8409 220 dumbbell 0.9507 0.9184 0.9343 147 elephant 0.9630 0.9765 0.9697 213 eyeglasses 0.8155 0.7919 0.8035 173 feather 0.9344 0.9344 0.9344 244 fence 0.8796 0.8482 0.8636 112 fish 0.9527 0.9495 0.9511 297 flamingo 0.9818 0.9474 0.9643 114 flower 0.8267 0.9219 0.8717 269 foot 0.7743 0.8578 0.8140 204 fork 0.9366 0.9433 0.9399 141 frog 0.9620 0.9383 0.9500 162 giraffe 0.9655 0.9396 0.9524 149 goatee 0.7914 0.8897 0.8377 145 grapes 0.9132 0.9609 0.9364 230 guitar 0.8462 0.9862 0.9108 145 hammer 0.8333 0.4386 0.5747 57 helicopter 0.9441 0.9620 0.9530 158 helmet 0.8509 0.8204 0.8354 167 horse 0.9091 0.9877 0.9467 81 kangaroo 0.9592 0.9691 0.9641 97 lantern 0.0000 0.0000 0.0000 30 laptop 0.8273 0.9200 0.8712 250 leaf 0.8449 0.8870 0.8655 301 lion 0.9697 0.9734 0.9715 263 lipstick 0.9634 0.8977 0.9294 88 lobster 0.9265 0.9130 0.9197 138 microphone 0.8917 0.8770 0.8843 122 monkey 0.9297 0.8947 0.9119 133 mosquito 0.9052 0.9211 0.9130 114 mouse 0.8632 0.8039 0.8325 102 mug 0.6928 0.7737 0.7310 137 mushroom 0.8174 0.8861 0.8504 202 onion 0.9538 0.9841 0.9688 126 panda 0.9643 0.8710 0.9153 62 peanut 0.8302 0.8462 0.8381 104 pear 0.7966 0.9658 0.8731 146 peas 0.6667 0.8438 0.7448 64 pencil 0.0000 0.0000 0.0000 21 penguin 0.9586 0.9701 0.9643 167 pig 0.8983 0.8785 0.8883 181 pillow 0.9570 0.9674 0.9622 92 pineapple 0.9808 0.9714 0.9761 105 potato 0.9444 0.5231 0.6733 65 power_outlet 0.5556 0.0676 0.1205 74 purse 0.9220 0.7182 0.8075 181 rabbit 0.9697 0.8767 0.9209 73 raccoon 0.7850 0.9097 0.8428 277 rhinoceros 0.9863 0.9863 0.9863 146 rifle 0.9143 0.9796 0.9458 98 saxophone 0.9381 0.8618 0.8983 246 screwdriver 0.7709 0.8706 0.8177 286 sea_turtle 0.9698 0.9507 0.9602 203 see_saw 0.3296 0.5738 0.4187 413 sheep 0.9254 0.9153 0.9203 366 shoe 0.9395 0.9688 0.9539 513 skateboard 0.7365 0.7831 0.7591 332 snake 0.8005 0.8737 0.8355 372 speedboat 0.8388 0.8833 0.8605 377 spider 0.7954 0.8696 0.8309 514 squirrel 0.8511 0.8484 0.8498 310 strawberry 0.8313 0.8471 0.8391 157 streetlight 0.7944 0.8134 0.8038 209 string_bean 0.7143 0.3000 0.4225 50 submarine 0.5916 0.6975 0.6402 162 swan 0.8966 0.8387 0.8667 186 table 0.6705 0.7522 0.7090 230 teapot 0.8464 0.8968 0.8709 252 teddy-bear 0.6818 0.8385 0.7521 161 television 0.8974 0.7071 0.7910 99 the_Eiffel_Tower 0.9860 0.9679 0.9769 218 the_Great_Wall_of_China 0.6389 0.8440 0.7273 109 tiger 0.9417 0.9604 0.9510 303 toe 0.0000 0.0000 0.0000 53 train 0.8650 0.9010 0.8827 192 truck 0.8136 0.9372 0.8710 191 umbrella 0.8650 0.8913 0.8779 230 vase 0.8082 0.8082 0.8082 146 watermelon 0.8947 0.8333 0.8629 102 whale 0.8910 0.8744 0.8826 215 zebra 0.9817 0.9727 0.9772 220 accuracy 0.8440 19317 macro avg 0.7818 0.7419 0.7475 19317 weighted avg 0.8404 0.8440 0.8352 19317 ``` The model categorizes images into the following 126 classes: - **Class 0:** "aircraft_carrier" - **Class 1:** "alarm_clock" - **Class 2:** "ant" - **Class 3:** "anvil" - **Class 4:** "asparagus" - **Class 5:** "axe" - **Class 6:** "banana" - **Class 7:** "basket" - **Class 8:** "bathtub" - **Class 9:** "bear" - **Class 10:** "bee" - **Class 11:** "bird" - **Class 12:** "blackberry" - **Class 13:** "blueberry" - **Class 14:** "bottlecap" - **Class 15:** "broccoli" - **Class 16:** "bus" - **Class 17:** "butterfly" - **Class 18:** "cactus" - **Class 19:** "cake" - **Class 20:** "calculator" - **Class 21:** "camel" - **Class 22:** "camera" - **Class 23:** "candle" - **Class 24:** "cannon" - **Class 25:** "canoe" - **Class 26:** "carrot" - **Class 27:** "castle" - **Class 28:** "cat" - **Class 29:** "ceiling_fan" - **Class 30:** "cell_phone" - **Class 31:** "cello" - **Class 32:** "chair" - **Class 33:** "chandelier" - **Class 34:** "coffee_cup" - **Class 35:** "compass" - **Class 36:** "computer" - **Class 37:** "cow" - **Class 38:** "crab" - **Class 39:** "crocodile" - **Class 40:** "cruise_ship" - **Class 41:** "dog" - **Class 42:** "dolphin" - **Class 43:** "dragon" - **Class 44:** "drums" - **Class 45:** "duck" - **Class 46:** "dumbbell" - **Class 47:** "elephant" - **Class 48:** "eyeglasses" - **Class 49:** "feather" - **Class 50:** "fence" - **Class 51:** "fish" - **Class 52:** "flamingo" - **Class 53:** "flower" - **Class 54:** "foot" - **Class 55:** "fork" - **Class 56:** "frog" - **Class 57:** "giraffe" - **Class 58:** "goatee" - **Class 59:** "grapes" - **Class 60:** "guitar" - **Class 61:** "hammer" - **Class 62:** "helicopter" - **Class 63:** "helmet" - **Class 64:** "horse" - **Class 65:** "kangaroo" - **Class 66:** "lantern" - **Class 67:** "laptop" - **Class 68:** "leaf" - **Class 69:** "lion" - **Class 70:** "lipstick" - **Class 71:** "lobster" - **Class 72:** "microphone" - **Class 73:** "monkey" - **Class 74:** "mosquito" - **Class 75:** "mouse" - **Class 76:** "mug" - **Class 77:** "mushroom" - **Class 78:** "onion" - **Class 79:** "panda" - **Class 80:** "peanut" - **Class 81:** "pear" - **Class 82:** "peas" - **Class 83:** "pencil" - **Class 84:** "penguin" - **Class 85:** "pig" - **Class 86:** "pillow" - **Class 87:** "pineapple" - **Class 88:** "potato" - **Class 89:** "power_outlet" - **Class 90:** "purse" - **Class 91:** "rabbit" - **Class 92:** "raccoon" - **Class 93:** "rhinoceros" - **Class 94:** "rifle" - **Class 95:** "saxophone" - **Class 96:** "screwdriver" - **Class 97:** "sea_turtle" - **Class 98:** "see_saw" - **Class 99:** "sheep" - **Class 100:** "shoe" - **Class 101:** "skateboard" - **Class 102:** "snake" - **Class 103:** "speedboat" - **Class 104:** "spider" - **Class 105:** "squirrel" - **Class 106:** "strawberry" - **Class 107:** "streetlight" - **Class 108:** "string_bean" - **Class 109:** "submarine" - **Class 110:** "swan" - **Class 111:** "table" - **Class 112:** "teapot" - **Class 113:** "teddy-bear" - **Class 114:** "television" - **Class 115:** "the_Eiffel_Tower" - **Class 116:** "the_Great_Wall_of_China" - **Class 117:** "tiger" - **Class 118:** "toe" - **Class 119:** "train" - **Class 120:** "truck" - **Class 121:** "umbrella" - **Class 122:** "vase" - **Class 123:** "watermelon" - **Class 124:** "whale" - **Class 125:** "zebra" # **Run with Transformers🤗** ```python !pip install -q transformers torch pillow gradio ``` ```python import gradio as gr from transformers import AutoImageProcessor, SiglipForImageClassification from transformers.image_utils import load_image from PIL import Image import torch # Load model and processor model_name = "prithivMLmods/Sketch-126-DomainNet" model = SiglipForImageClassification.from_pretrained(model_name) processor = AutoImageProcessor.from_pretrained(model_name) def sketch_classification(image): """Predicts the sketch category for an input image.""" # Convert the input numpy array to a PIL Image and ensure it has 3 channels (RGB) image = Image.fromarray(image).convert("RGB") # Process the image and prepare it for the model inputs = processor(images=image, return_tensors="pt") # Perform inference without gradient calculation with torch.no_grad(): outputs = model(**inputs) logits = outputs.logits # Convert logits to probabilities using softmax probs = torch.nn.functional.softmax(logits, dim=1).squeeze().tolist() # Mapping from indices to corresponding sketch category labels labels = { "0": "aircraft_carrier", "1": "alarm_clock", "2": "ant", "3": "anvil", "4": "asparagus", "5": "axe", "6": "banana", "7": "basket", "8": "bathtub", "9": "bear", "10": "bee", "11": "bird", "12": "blackberry", "13": "blueberry", "14": "bottlecap", "15": "broccoli", "16": "bus", "17": "butterfly", "18": "cactus", "19": "cake", "20": "calculator", "21": "camel", "22": "camera", "23": "candle", "24": "cannon", "25": "canoe", "26": "carrot", "27": "castle", "28": "cat", "29": "ceiling_fan", "30": "cell_phone", "31": "cello", "32": "chair", "33": "chandelier", "34": "coffee_cup", "35": "compass", "36": "computer", "37": "cow", "38": "crab", "39": "crocodile", "40": "cruise_ship", "41": "dog", "42": "dolphin", "43": "dragon", "44": "drums", "45": "duck", "46": "dumbbell", "47": "elephant", "48": "eyeglasses", "49": "feather", "50": "fence", "51": "fish", "52": "flamingo", "53": "flower", "54": "foot", "55": "fork", "56": "frog", "57": "giraffe", "58": "goatee", "59": "grapes", "60": "guitar", "61": "hammer", "62": "helicopter", "63": "helmet", "64": "horse", "65": "kangaroo", "66": "lantern", "67": "laptop", "68": "leaf", "69": "lion", "70": "lipstick", "71": "lobster", "72": "microphone", "73": "monkey", "74": "mosquito", "75": "mouse", "76": "mug", "77": "mushroom", "78": "onion", "79": "panda", "80": "peanut", "81": "pear", "82": "peas", "83": "pencil", "84": "penguin", "85": "pig", "86": "pillow", "87": "pineapple", "88": "potato", "89": "power_outlet", "90": "purse", "91": "rabbit", "92": "raccoon", "93": "rhinoceros", "94": "rifle", "95": "saxophone", "96": "screwdriver", "97": "sea_turtle", "98": "see_saw", "99": "sheep", "100": "shoe", "101": "skateboard", "102": "snake", "103": "speedboat", "104": "spider", "105": "squirrel", "106": "strawberry", "107": "streetlight", "108": "string_bean", "109": "submarine", "110": "swan", "111": "table", "112": "teapot", "113": "teddy-bear", "114": "television", "115": "the_Eiffel_Tower", "116": "the_Great_Wall_of_China", "117": "tiger", "118": "toe", "119": "train", "120": "truck", "121": "umbrella", "122": "vase", "123": "watermelon", "124": "whale", "125": "zebra" } # Create a dictionary mapping each label to its predicted probability (rounded) predictions = {labels[str(i)]: round(probs[i], 3) for i in range(len(probs))} return predictions # Create Gradio interface iface = gr.Interface( fn=sketch_classification, inputs=gr.Image(type="numpy"), outputs=gr.Label(label="Prediction Scores"), title="Sketch-126-DomainNet Classification", description="Upload a sketch to classify it into one of 126 categories." ) # Launch the app if __name__ == "__main__": iface.launch() ``` --- # **Intended Use:** The **Sketch-126-DomainNet** model is designed for sketch image classification. It is capable of categorizing sketches into a wide range of domains—from objects like an "aircraft_carrier" or "alarm_clock" to animals, plants, and everyday items. Potential use cases include: - **Art and Design Applications:** Assisting artists and designers in organizing and retrieving sketches based on content. - **Creative Search Engines:** Enabling sketch-based search for design inspiration. - **Educational Tools:** Helping students and educators in art and design fields with categorization and retrieval of visual resources. - **Computer Vision Research:** Providing a benchmark dataset for sketch recognition and domain adaptation tasks.