What languages does snowflake-arctic-embed-m-v2.0 support?
Thank you for your excellent work, which has greatly assisted our project. Now, we would like to train a multilingual quality classifier based on snowflake-arctic-embed-m-v2.0, but we are unsure which languages snowflake-arctic-embed-m-v2.0 specifically supports. We hope you can inform us.
The best way to find out if this model is a good choice for your fine-tuning task is to try it out. You can look at the training details in our technical report (linked in the news section of our model card) for information about which languages we focused on with our contrastive training, but this may or may not translate strongly into a sense of how classification performance will turn on on a quality classification task.
Other potentially helpful details: The tokenizer is from XLMR and the MLM pretraining details are here: https://arxiv.org/abs/2407.19669