avsolatorio/NoInstruct-small-Embedding-v0 · Are any of your pretrained models available for commercial use?

Most of the models in https://www.sbert.net/docs/sentence_transformer/pretrained_models.html appear to be trained on MS Marco. My understanding is that any model that uses that dataset is not able to be used commercially. So, I am confused why for example https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 is listed as Apache v2.0, when its training data includes MS Marco.

From reading qwen3 paper (Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models), I was hopeful because you mention their training data is synthetic and they reference Apache v2 models in their abstract. However, table 6 lists MS Marco as one of their training dataset.

In any case, do you know of pretrained models from anyone else that can be used commercially?