MxbAI Ettin 17M - Contrastive Pretrained

This is a contrastively pretrained version of the Ettin 17M encoder model.

Model Details

  • Base Model: jhu-clsp/ettin-encoder-17m
  • Model Size: 17M parameters
  • Training: Contrastive pretraining on large-scale text pairs
  • Sequence Length: 512 tokens
  • Pooling: Mean pooling

Usage

from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('your-username/mxbai-ettin-17m-pretrained')
model = AutoModel.from_pretrained('your-username/mxbai-ettin-17m-pretrained')

# Encode sentences
sentences = ["Example sentence 1", "Example sentence 2"]
inputs = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt', max_length=512)

with torch.no_grad():
    outputs = model(**inputs)
    # Mean pooling
    embeddings = outputs.last_hidden_state.mean(dim=1)
    # Normalize embeddings
    embeddings = F.normalize(embeddings, p=2, dim=1)

Training Details

  • Batch size: Large-scale distributed training
  • Learning rate: Cosine schedule with warmup
  • Loss: CLIP-style contrastive loss
  • Hardware: 8x A100 GPUs
Downloads last month
627
Safetensors
Model size
16.8M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RikiyaT/mxbai-ettin-17m-pretrained

Finetuned
(16)
this model
Finetunes
1 model