sentence-transformers/static-retrieval-mrl-en-v1

1 day ago

Is this something that could be quantized perhaps and do you know if it's compatible with llama.CPP. And would it work with Transformers.JS and ONNX runtime in general?

tomaarsen

Sentence Transformers org 1 day ago

Hello!

It's not compatible with Llama.cpp I'm afraid; that's likely not going to happen either, this model is just too different.
Support for Transformers.js is more plausible (cc @Xenova ) because we already have support for Model2Vec models, and these are quite similar. However, I can't be sure that it'll be added.

Tom Aarsen

Xenova

1 day ago

Done: https://huggingface.co/sentence-transformers/static-retrieval-mrl-en-v1/discussions/2

Conversion code:

import torch
from sentence_transformers import SentenceTransformer

class WrappedModel(torch.nn.Module):
  def __init__(self, m):
    super().__init__()
    self.embedding = m[0].embedding
  def forward(self, input_ids, attention_mask):
    indices = input_ids[attention_mask == 1]
    offsets = torch.cat([torch.tensor([0]), attention_mask.sum(dim=1)[:-1].cumsum(dim=0)])
    return self.embedding(indices, offsets)

shape = (3, 4)
input_ids = torch.tensor([1, 2, 3, 4, 5, 6, -1, -1, 1, 1, 1, 0]).view(shape)
attention_mask = torch.tensor([1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0]).view(shape)

model = SentenceTransformer("tomaarsen/static-retrieval-mrl-en-v1")
wrapped = WrappedModel(model) # test forward pass

# Export the model
torch.onnx.export(wrapped,
                  (input_ids, attention_mask),
                  "model.onnx",
                  export_params=True,
                  opset_version=14,
                  do_constant_folding=True,
                  input_names = ['input_ids', 'attention_mask'],
                  output_names = ['sentence_embedding'],
                  dynamic_axes={
                      'input_ids' : {0 : 'batch_size', 1: 'sequence_length'},
                      'attention_mask' : {0 : 'batch_size', 1: 'sequence_length'},
                      'sentence_embedding' : {0 : 'batch_size'},
                  })

sentence-transformers
/

static-retrieval-mrl-en-v1

Quants?