Can you make it compatible with Infinity server?
Here is the issue I'm seeing while trying to deploy models with Inifinity: https://github.com/michaelfeil/infinity/issues/543
Apparently it goes down to limited config.json of the model.
Is there anything I/You can do about it?
Thanks!
Ok. Looks like all we need to do is to update config.json:
(config values were extracted from loaded model)
{
"architectures": ["SigLIP"],
"auto_map": {
"AutoConfig": "marqo_fashionSigLIP.MarqoFashionSigLIPConfig",
"AutoModel": "marqo_fashionSigLIP.MarqoFashionSigLIP",
"AutoProcessor": "marqo_fashionSigLIP.MarqoFashionSigLIPProcessor"
},
"open_clip_model_name": "hf-hub:Marqo/marqo-fashionSigLIP",
"model_type": "siglip",
"hidden_size": 768,
"projection_dim": 768,
"text_config": {
"attention_dropout": 0.0,
"bos_token_id": 49406,
"eos_token_id": 49407,
"hidden_act": "gelu_pytorch_tanh",
"hidden_size": 768,
"intermediate_size": 3072,
"layer_norm_eps": 1e-6,
"max_position_embeddings": 64,
"model_type": "siglip_text_model",
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pad_token_id": 1,
"transformers_version": "4.47.1",
"vocab_size": 32000
},
"vision_config": {
"attention_dropout": 0.0,
"hidden_act": "gelu_pytorch_tanh",
"hidden_size": 768,
"image_size": 224,
"intermediate_size": 3072,
"layer_norm_eps": 1e-6,
"model_type": "siglip_vision_model",
"num_attention_heads": 12,
"num_channels": 3,
"num_hidden_layers": 12,
"patch_size": 16,
"transformers_version": "4.47.1"
},
"initializer_factor": 1.0,
"return_dict": true,
"output_hidden_states": false,
"output_attentions": false,
"torchscript": false,
"use_bfloat16": false,
"tie_word_embeddings": true
}
thanks @pySilver ! can you clarify the changes above? is that config with the change or does it still need to be applied?
@Jesse-marqo looks like the model is not compatible with sentence transformers by default. But the only issue is incomplete config.json here. Sentence transformers complains about various missing fields in config.
What I did is, I simply loaded the model with transformers (that is supported) and printed the config in console directly from the model.
Then I forked the model and replaced config file there to verify it works as expected. And yep - it works, no surprise here.
For that model infinity server uses sentence transformers, so my issue got solved.
I think it’s a good idea to apply this changes to original model (clip/siglip)
P.S. any plans to make a siglip2 version?