feat: add sbert support

#25
by bwang0911 - opened
Jina AI org
  1. code mostly from @tomaarsen , i made some modifications with small changes, note: custom_st.py was directly added to this repo, not impl repo.
  2. tested on my own replication (test code below), note, added 2_Normalize to modules.json to ensure embedding always noramlised as default.
  3. once sbert release, should PR and update Readme.

test code:

from sentence_transformers import SentenceTransformer
from transformers import AutoModel

import numpy as np
import numpy.testing as npt


model = SentenceTransformer('bwang0911/test-jina-clip', trust_remote_code=True)

et = model.encode(['Hello world'])

em = model.encode(['https://i.pinimg.com/600x315/21/48/7e/21487e8e0970dd366dafaed6ab25d8d8.jpg'])


model2 = AutoModel.from_pretrained('bwang0911/test-jina-clip', trust_remote_code=True)

et2 = model2.encode_text(['Hello world'])
em2 = model2.encode_image(['https://i.pinimg.com/600x315/21/48/7e/21487e8e0970dd366dafaed6ab25d8d8.jpg'])

assert np.allclose(et, et2, rtol=1e-4, atol=1e-4), "Arrays are not almost equal"
assert np.allclose(em, em2, rtol=1e-4, atol=1e-4), "Arrays are not almost equal"
bwang0911 changed pull request status to merged

Sign up or log in to comment