Can't get Tokenizer on Windows (CPU)
Hi All,
This is my code:
model_name="dicta-il/dictalm2.0-instruct-GGUF"
self.tokenizer = AutoTokenizer.from_pretrained(model_name)
#self.tokenizer = LlamaTokenizerFast.from_pretrained(model_name)
self.model = AutoModel.from_pretrained(model_name)
self.device = "cuda" if torch.cuda.is_available() else "cpu"
self.model = self.model.to(self.device)
However, I get this error:
OSError: Can't load tokenizer for 'dicta-il/dictalm2.0-instruct-GGUF'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'dicta-il/dictalm2.0-instruct-GGUF' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.
Help will be highly appreciated.
Thanks in Advance
The GGUF format must be loaded using a supported framework such as llama.cpp, Ollama, LM Studio.
You can check the example code + instructions listed on each model page to load the model correctly and to figure out which is the correct model to use.
I didn't see any example anywhere, that was my problem... However, eventually I managed to find the solution:
model_name = "dicta-il/dictalm2.0-instruct-GGUF",
self.tokenizer = AutoTokenizer.from_pretrained("dicta-il/dictalm2.0-instruct")
self.model = Llama.from_pretrained(repo_id=model_name,
filename="dictalm2.0-instruct.Q4_K_M.gguf",
n_gpu_layers=0,
n_threads=multiprocessing.cpu_count(),
embedding=True,
verbose=False)
It also required installing microsoft visual C++ and the python package llama-cpp-python
Hi
@Shaltiel
,
It seems I was happy too early... the tokenizer always yields the same tokens no matter what is the input text, and as a result I always get the same vector embedding. I also tried the model dicta-il/dictalm2.0-GGUF.
But, I encounter the same scenario. Here is my encoding method:
def encode(self, sentence):
# Tokenize the sentence
print("sentence:", sentence)
tokens = self.tokenizer(sentence)
print("tokens:", tokens[:10])
# Generate embeddings
embeddings = self.model.embed(tokens)[0][0]
print("embeddings:", embeddings[:10])
return embeddings
and here are a few of my outputs:
sentence: מאיזה גיל מומלץ להתחיל לצחצח שיניים אצל ילדים?
tokens: [Encoding(num_tokens=22, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])]
embeddings: [-2.491905689239502, 0.028786303475499153, -1.9083914756774902, 2.2526683807373047, -2.1519370079040527, -3.5000791549682617, 5.3165459632873535, 0.1344803422689438, -1.7638733386993408, 1.930660367012024]
sentence: כואב לי מאד בעת שתיית מים קרים. מה עלי לעשות?
tokens: [Encoding(num_tokens=21, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])]
embeddings: [-2.491905689239502, 0.028786303475499153, -1.9083914756774902, 2.2526683807373047, -2.1519370079040527, -3.5000791549682617, 5.3165459632873535, 0.1344803422689438, -1.7638733386993408, 1.930660367012024]
sentence: שלום,
אני מחפשת רופא שיניים טוב בתל אביב שהוא חבר הר"ש. היכן אני יכולה לראות את רשימת הרופאים בתל אביב?
tokens: [Encoding(num_tokens=43, attributes=[ids, type_ids, tokens, offsets, attention_mask, special_tokens_mask, overflowing])]
embeddings: [-2.491905689239502, 0.028786303475499153, -1.9083914756774902, 2.2526683807373047, -2.1519370079040527, -3.5000791549682617, 5.3165459632873535, 0.1344803422689438, -1.7638733386993408, 1.930660367012024]
Any help will be highly appreciated...
Thanks in advance,
Eli
Hi Eli,
As I said previously, the GGUF format should be used with the library llama.cpp, or in any of the frameworks which use it such as https://ollama.com/ and https://lmstudio.ai/. The transformers library support for the GGUF format is very exploratory, and isn't officially fully supported yet: https://huggingface.co/docs/transformers/en/gguf#support-within-transformers.
If you wish to use it in code, I recommend choosing a different format from https://huggingface.co/collections/dicta-il/dicta-lm-20-collection-661bbda397df671e4a430c27 and using the code examples listed on the page of the chosen model.
Hi Eli,
As I said previously, the GGUF format should be used with the library llama.cpp, or in any of the frameworks which use it such as https://ollama.com/ and https://lmstudio.ai/. The transformers library support for the GGUF format is very exploratory, and isn't officially fully supported yet: https://huggingface.co/docs/transformers/en/gguf#support-within-transformers.
If you wish to use it in code, I recommend choosing a different format from https://huggingface.co/collections/dicta-il/dicta-lm-20-collection-661bbda397df671e4a430c27 and using the code examples listed on the page of the chosen model.
Hi
@Shaltiel
,
I have followed all your instructions as you described before (I have installed C++ and llama.cpp). I have also tried this link: https://huggingface.co/docs/transformers/en/gguf#support-within-transformers . However, I don't want to use a different model...
Is there a different model which enables me to get embeddings of the text (an encoder)? Preferably using CPU, but if it will require a GPU I will find a way to use it...
Thanks in Advance,
Eli
Check out the DictaBERT collection: https://huggingface.co/collections/dicta-il/dictabert-6588e7cc08f83845fc42a18b