Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
16
4
32
Jack Morris
jxm
Follow
mar480's profile picture
marksverdhei's profile picture
radames's profile picture
43 followers
·
8 following
http://jxmo.io
jxmnop
jxmorris12
AI & ML interests
natural language processing, text retrieval, embeddings, inversion
Recent Activity
new
activity
about 13 hours ago
jxm/cde-small-v1:
Optional: Link to new version
new
activity
about 14 hours ago
jxm/cde-small-v2:
Amazing model. Does it support multilingual?
posted
an
update
about 16 hours ago
New state-of-the-art BERT-size retrieval model: *cde-small-v2* 🥳🍾 Hi everyone! We at Cornell are releasing a new retrieval model this week. It uses the contextual embeddings framework, is based on ModernBERT backbone, and gets state-of-the-art results on the MTEB benchmark for its model size (140M parameters). cde-small-v2 gets an average score of 65.6 across the 56 datasets and sees improvements from our previous model in *every* task domain (retrieval, classification, etc.). We made a lot of changes to make this model work. First of all, ModernBERT has a better tokenizer, which probably helped this work out-of-the-box. We also followed the principles from the CDE paper and used harder clusters and better hard-negative filtering, which showed a small performance improvement. And we made a few small changes that have been shown to work on the larger models: we disabled weight decay, masked out the prefix tokens during pooling, and added a residual connection from the first-stage to the second-stage for better gradient flow. We're still looking for a computer sponsor to help us scale CDE to larger models. Since it's now state-of-the-art at the 100M parameter scale, it seems to be a reasonable bet that we could train a state-of-the-art large model if we had the GPUs. If you're interested in helping with this, please reach out! Here's a link to the model: https://huggingface.co/jxm/cde-small-v2 And here's a link to the paper: https://huggingface.co/papers/2410.02525
View all activity
Organizations
jxm
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
liked
a model
3 days ago
OrcaDB/cde-small-v1
Feature Extraction
•
Updated
16 days ago
•
7.27k
•
2
liked
a dataset
about 1 month ago
zeta-alpha-ai/NanoMSMARCO
Viewer
•
Updated
Sep 10, 2024
•
5.14k
•
666
•
2
liked
a model
about 2 months ago
deepseek-ai/DeepSeek-V2.5
Text Generation
•
Updated
Dec 11, 2024
•
4.22k
•
684
liked
a model
2 months ago
google-bert/bert-base-uncased
Fill-Mask
•
Updated
Feb 19, 2024
•
65.6M
•
2.06k
liked
a dataset
3 months ago
cfli/bge-full-data
Updated
Oct 11, 2024
•
1.66k
•
30
liked
a dataset
4 months ago
cornell-movie-review-data/rotten_tomatoes
Viewer
•
Updated
Mar 18, 2024
•
10.7k
•
9.62k
•
67
liked
a dataset
5 months ago
HuggingFaceFW/fineweb
Viewer
•
Updated
15 days ago
•
48.6B
•
262k
•
1.83k
liked
a model
5 months ago
nomic-ai/nomic-embed-text-v1
Sentence Similarity
•
Updated
Sep 26, 2024
•
347k
•
480
liked
2 models
6 months ago
nomic-ai/nomic-embed-text-v1.5
Sentence Similarity
•
Updated
1 day ago
•
1.31M
•
491
allenai/OLMo-7B-hf
Text Generation
•
Updated
Jul 16, 2024
•
3.84k
•
14
liked
a dataset
6 months ago
yuntian-deng/dolmasample
Viewer
•
Updated
Feb 28, 2024
•
127k
•
28
•
2
liked
a dataset
8 months ago
allenai/WildChat-1M
Viewer
•
Updated
Oct 17, 2024
•
838k
•
2.1k
•
305
liked
a model
about 1 year ago
intfloat/e5-mistral-7b-instruct
Feature Extraction
•
Updated
Apr 23, 2024
•
174k
•
484
liked
2 datasets
over 1 year ago
Tevatron/msmarco-passage-corpus
Viewer
•
Updated
Mar 16, 2022
•
8.84M
•
350
•
9
facebook/kilt_tasks
Viewer
•
Updated
Jan 4, 2024
•
3.23M
•
4.87k
•
54
liked
2 models
over 1 year ago
distilbert/distilbert-base-uncased
Fill-Mask
•
Updated
May 6, 2024
•
11.4M
•
•
600
meta-llama/Llama-2-7b-hf
Text Generation
•
Updated
Apr 17, 2024
•
1.02M
•
1.9k
liked
a Space
over 1 year ago
Running
on
CPU Upgrade
4.57k
🥇
MTEB Leaderboard
liked
a dataset
over 1 year ago
yuntian-deng/gpt2-detectability
Viewer
•
Updated
Jun 2, 2023
•
520k
•
35
•
1
liked
a Space
over 1 year ago
Running
267
🚀
Chat-with-GPT4o-mini
Load more