Spaces:
Running
on
CPU Upgrade
New SOTA! Apply for refreshing the results
Thanks for the great work!
We submitted a new SOTA model, mixedbread-ai/mxbai-mistral-7b-reprev-v1
on English MTEB. Could you help refresh this space?
Thanks!
Hello!
I've refreshed the leaderboard! Congratulations.
I'd love to help integrate your model with Sentence Transformers, would you be interested in this?
Furthermore, I'm certainly curious about more details on your model & how it was trained.
cc @Muennighoff I've noticed the leaderboard does not report the Model Size for this model. How should we resolve this?
- Tom Aarsen
Hey @tomaarsen ,
thank you! We would love to integrate the model into Sentence Transformers and would offer help. Also at modernising everything e.g. downloading only safetensors, ignoring onnx, updating training script etc. Maybe we can have a chat how we can contribute (aamir at mixedbread.ai) :)
Regarding training, sorry for the sparse information on that. We used the Angle Loss proposed by @SeanLee97 on a mixture of synthetic and retrieval data (extremely filtered and cleaned). We did a lot of checks to ensure that we have no data contamination and checks on different benchmarks then MTEB. We are in the process of publishing more details. Currently we are experimenting with a lot of different data mixtures, models (e.g. Phi-2, M2-Bert, Linformer) and training methods. We aim to share them with the research community, thats why we also called it research-preview. More to come soon!
Hope this helps.
Aamir.
Hm we may need to look for safetensor files as well and sum them if multiple are there
@Muennighoff
I read that via huggingface_hub
we can use model_info
to extract the model size. I can try to invest some time to implement this.
Also at modernising everything e.g. downloading only safetensors, ignoring onnx, updating training script etc.
Downloading only safetensors & ignoring onnx is already implemented in the repo, but I've yet to push the release.
I'm certainly interested in some help with the rest, though!
I am considering some modernisations of my own:
- Improved training via the
transformers
Trainer: multi-GPU support, gradient accumulation, gradient checkpointing, improved callbacks, bf16, etc. - Easier model loading in lower precision.
- Revising how models are saved & loaded (i.e. less configuration files and no multiple weight files)
- Prompt templates.
- AnglE loss
- Easier multiple losses (e.g. InfoNCE/NegativeMultipleRankingLoss + Cosine + AnglE)
- Tom Aarsen
Sounds extremely good! I think for a lot of training related things Tevatron is really great. We will discuss in the team and help with integrating this into sentence transformers. Really amazing what you are doing!
@Muennighoff I read that via
huggingface_hub
we can usemodel_info
to extract the model size. I can try to invest some time to implement this.Also at modernising everything e.g. downloading only safetensors, ignoring onnx, updating training script etc.
Downloading only safetensors & ignoring onnx is already implemented in the repo, but I've yet to push the release.
I'm certainly interested in some help with the rest, though!I am considering some modernisations of my own:
- Improved training via the
transformers
Trainer: multi-GPU support, gradient accumulation, gradient checkpointing, improved callbacks, bf16, etc.- Easier model loading in lower precision.
- Revising how models are saved & loaded (i.e. less configuration files and no multiple weight files)
- Prompt templates.
- AnglE loss
- Easier multiple losses (e.g. InfoNCE/NegativeMultipleRankingLoss + Cosine + AnglE)
- Tom Aarsen
Sure that'd be amazing!