Embedding Models
Collection
Some embedding models I've trained, finetuned, distilled, converted, or something else entirely
•
14 items
•
Updated
loss: 0.056979671120643616
This is the bert-tiny model finetuned on 15B tokens for embedding/feature extraction, for English and Brazillian Portuguese languages.
The output vector size is 128.
This model only has 4.4M params but the quality of the embeddings punch way above its size after tuning.
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the Hugging Face Hub
model = SentenceTransformer("cnmoro/bert-tiny-embeddings-english-portuguese")
# Run inference
sentences = [
'first passage',
'second passage'
]
embeddings = model.encode(sentences)
print(embeddings.shape)
Base model
google/bert_uncased_L-2_H-128_A-2