Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
16
4
32
Jack Morris
jxm
Follow
dhruv-anand-aintech's profile picture
anthonyivn's profile picture
Mouradology's profile picture
43 followers
·
8 following
http://jxmo.io
jxmnop
jxmorris12
AI & ML interests
natural language processing, text retrieval, embeddings, inversion
Recent Activity
new
activity
about 13 hours ago
jxm/cde-small-v1:
Optional: Link to new version
new
activity
about 15 hours ago
jxm/cde-small-v2:
Amazing model. Does it support multilingual?
posted
an
update
about 16 hours ago
New state-of-the-art BERT-size retrieval model: *cde-small-v2* 🥳🍾 Hi everyone! We at Cornell are releasing a new retrieval model this week. It uses the contextual embeddings framework, is based on ModernBERT backbone, and gets state-of-the-art results on the MTEB benchmark for its model size (140M parameters). cde-small-v2 gets an average score of 65.6 across the 56 datasets and sees improvements from our previous model in *every* task domain (retrieval, classification, etc.). We made a lot of changes to make this model work. First of all, ModernBERT has a better tokenizer, which probably helped this work out-of-the-box. We also followed the principles from the CDE paper and used harder clusters and better hard-negative filtering, which showed a small performance improvement. And we made a few small changes that have been shown to work on the larger models: we disabled weight decay, masked out the prefix tokens during pooling, and added a residual connection from the first-stage to the second-stage for better gradient flow. We're still looking for a computer sponsor to help us scale CDE to larger models. Since it's now state-of-the-art at the 100M parameter scale, it seems to be a reasonable bet that we could train a state-of-the-art large model if we had the GPUs. If you're interested in helping with this, please reach out! Here's a link to the model: https://huggingface.co/jxm/cde-small-v2 And here's a link to the paper: https://huggingface.co/papers/2410.02525
View all activity
Organizations
jxm
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
New activity in
jxm/cde-small-v1
about 13 hours ago
Optional: Link to new version
#7 opened 4 days ago by
tomaarsen
New activity in
jxm/cde-small-v2
about 15 hours ago
Amazing model. Does it support multilingual?
1
#3 opened about 16 hours ago by
rjmehta
New activity in
jxm/cde-small-v2
1 day ago
Is the year wrong in the README?
1
#2 opened 2 days ago by
GGmorello
New activity in
jxm/cde-small-v2
3 days ago
Set base_model & tags metadata
1
#1 opened 4 days ago by
tomaarsen
New activity in
nomic-ai/nomic-bert-2048
about 1 month ago
the nomic embedding model fails with error `configuration_hf_nomic_bert' has no attribute 'NomicBertConfig'`
3
#19 opened about 1 month ago by
lbwavebo-uber
New activity in
jxm/cde-small-v1
about 1 month ago
"jxm/cde-small-v1" doesnt load with SetFitModel.from_pretrained
1
#5 opened 3 months ago by
moshew
New activity in
jxm/cde-small-v1
3 months ago
Does it support multilingual embedding? Could you provide the train/test code?
3
#4 opened 3 months ago by
sunshin5
Integrate with Sentence Transformers
1
#3 opened 3 months ago by
tomaarsen
commented
a paper
4 months ago
Contextual Document Embeddings
Paper
•
2410.02525
•
Published
Oct 3, 2024
•
19
•
3
New activity in
jxm/cde-small-v1
4 months ago
License
2
#1 opened 4 months ago by
mrfakename
New activity in
nomic-ai/nomic-bert-2048
4 months ago
add full support for inputs_embeds
#10 opened 4 months ago by
jxm
add full support for inputs_embeds
#9 opened 4 months ago by
jxm
make input_ids optional
#8 opened 4 months ago by
jxm
Add inputs_embeds argument
#7 opened 4 months ago by
jxm
New activity in
textattack/roberta-base-SST-2
over 1 year ago
Adding `safetensors` variant of this model
#2 opened almost 2 years ago by
SFconvertbot
Add TF weights
1
#1 opened over 2 years ago by
joaogante