Disable trust_remote_code
Hey,
Thanks for the awesome model. We would like to take it for a spin without trust_remote_code. I have download both the actual model that am trying (Alibaba-NLP/gte-large-en-v1.5) and modelling files from (Alibaba-NLP/new-impl) to the local. Is there a way to update the config in the gte-large model to avoid trust_remote_code=True as the network is blocking that call inside our company network.
Thanks.
Hi, you can move the modeling.py
and configuration.py
into the Alibaba-NLP/gte-large-en-v1.5
folder, and substitute the following lines in config.json
"auto_map": {
"AutoConfig": "Alibaba-NLP/new-impl--configuration.NewConfig",
"AutoModel": "Alibaba-NLP/new-impl--modeling.NewModel",
"AutoModelForMaskedLM": "Alibaba-NLP/new-impl--modeling.NewForMaskedLM",
"AutoModelForMultipleChoice": "Alibaba-NLP/new-impl--modeling.NewForMultipleChoice",
"AutoModelForQuestionAnswering": "Alibaba-NLP/new-impl--modeling.NewForQuestionAnswering",
"AutoModelForSequenceClassification": "Alibaba-NLP/new-impl--modeling.NewForSequenceClassification",
"AutoModelForTokenClassification": "Alibaba-NLP/new-impl--modeling.NewForTokenClassification"
},
with
"auto_map": {
"AutoConfig": "configuration.NewConfig",
"AutoModel": "modeling.NewModel",
"AutoModelForMaskedLM": "modeling.NewForMaskedLM",
"AutoModelForMultipleChoice": "modeling.NewForMultipleChoice",
"AutoModelForQuestionAnswering": "modeling.NewForQuestionAnswering",
"AutoModelForSequenceClassification": "modeling.NewForSequenceClassification",
"AutoModelForTokenClassification": "modeling.NewForTokenClassification"
},
Excuse me, I tried the method mentioned above and set 'trust_remote_code' to False, but I still couldn't execute the example code for sentence-transformers successfully. Is there any additional adjustment needed to avoid the ValueError?(I have downloaded the model located at './Alibaba-NLP/gte-large-en-v1.5'.)
Code:
# Requires sentence_transformers>=2.7.0
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
sentences = ['That is a happy person', 'That is a very happy person']
model = SentenceTransformer('Alibaba-NLP/gte-large-en-v1.5', trust_remote_code=False)
embeddings = model.encode(sentences)
print(cos_sim(embeddings[0], embeddings[1]))
Error message:
Traceback (most recent call last):
File "/home/jiangxuehaokeai/test_embedding_model/test.py", line 8, in <module>
model = SentenceTransformer('Alibaba-NLP/gte-large-en-v1.5', trust_remote_code=False)
File "/home/jiangxuehaokeai/.local/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 197, in __init__
modules = self._load_sbert_model(
File "/home/jiangxuehaokeai/.local/lib/python3.10/site-packages/sentence_transformers/SentenceTransformer.py", line 1296, in _load_sbert_model
module = Transformer(model_name_or_path, cache_dir=cache_folder, **kwargs)
File "/home/jiangxuehaokeai/.local/lib/python3.10/site-packages/sentence_transformers/models/Transformer.py", line 35, in __init__
config = AutoConfig.from_pretrained(model_name_or_path, **model_args, cache_dir=cache_dir)
File "/home/jiangxuehaokeai/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1114, in from_pretrained
trust_remote_code = resolve_trust_remote_code(
File "/home/jiangxuehaokeai/.local/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 621, in resolve_trust_remote_code
raise ValueError(
ValueError: Loading Alibaba-NLP/gte-large-en-v1.5 requires you to execute the configuration file in that repo on your local machine. Make sure you have read the code there to avoid malicious use, then set the option `trust_remote_code=True` to remove this error.
@jiangxuehaokeai
you could try passing a local path to SentenceTransformer, eg model = SentenceTransformer('./Alibaba-NLP/gte-large-en-v1.5', trust_remote_code=False).
@jiangxuehaokeai you could try passing a local path to SentenceTransformer, eg
model = SentenceTransformer('./Alibaba-NLP/gte-large-en-v1.5', trust_remote_code=False).
Thank you for your prompt response, but unfortunately, the same issue still persists. QQ
We have the same issue. Please let me know, in case you found a solution
@jiangxuehaokeai
.
We would be very happy to use the model without the need to connect to huggingface that is as well not possible inside our network.
@jiangxuehaokeai
I didn't see you using trust_remote_code=False
. After the processing of Alibaba-NLP/new-impl/discussions/2#comment-2, it still requires setting trust_remote_code=True
.
I have re-executed these steps and successfully loaded the model.
ping
@phizdbc
@izhx Thank you for the reminder. After I reverted 'trust_remote_code' back to True, I encountered the following error:
File "/home/jiangxuehaokeai/.local/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 112
0, in from_pretrained
config_class = get_class_from_dynamic_module(
File "/home/jiangxuehaokeai/.local/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 501, in get_class_from_dynamic_module
return get_class_in_module(class_name, final_module.replace(".py", ""))
File "/home/jiangxuehaokeai/.local/lib/python3.10/site-packages/transformers/dynamic_module_utils.py", line 201, in get_class_in_module
module = importlib.import_module(module_path)
File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'transformers_modules.gte-large-en-v1'
Then, following the code in the library, I checked and found that when handling the path for the model's name, the program was cutting off characters after the .
This caused the model's name to become transformers_modules.gte-large-en-v1
instead of transformers_modules.gte-large-en-v1.5
. Afterwards, I renamed the path for the model's name to ./Alibaba-NLP/gte-large-en-v1-5
, allowing me to use the model locally even when disconnected from the internet.
@phizdbc
You can check to see if you encounter the same error as I did.
It seems there is a tranformers package version thing here - my colleague run into the same error and another one followed: 'NewConfig' object has no attribute '_attn_implementation'
but upgrading to tranformers>=4.41.0
, solved both errors.
Thanks
@izhx
and
@jiangxuehaokeai
.
Hi, you can move the
modeling.py
andconfiguration.py
into theAlibaba-NLP/gte-large-en-v1.5
folder, and substitute the following lines inconfig.json
"auto_map": { "AutoConfig": "Alibaba-NLP/new-impl--configuration.NewConfig", "AutoModel": "Alibaba-NLP/new-impl--modeling.NewModel", "AutoModelForMaskedLM": "Alibaba-NLP/new-impl--modeling.NewForMaskedLM", "AutoModelForMultipleChoice": "Alibaba-NLP/new-impl--modeling.NewForMultipleChoice", "AutoModelForQuestionAnswering": "Alibaba-NLP/new-impl--modeling.NewForQuestionAnswering", "AutoModelForSequenceClassification": "Alibaba-NLP/new-impl--modeling.NewForSequenceClassification", "AutoModelForTokenClassification": "Alibaba-NLP/new-impl--modeling.NewForTokenClassification" },
with
"auto_map": { "AutoConfig": "configuration.NewConfig", "AutoModel": "modeling.NewModel", "AutoModelForMaskedLM": "modeling.NewForMaskedLM", "AutoModelForMultipleChoice": "modeling.NewForMultipleChoice", "AutoModelForQuestionAnswering": "modeling.NewForQuestionAnswering", "AutoModelForSequenceClassification": "modeling.NewForSequenceClassification", "AutoModelForTokenClassification": "modeling.NewForTokenClassification" },
how where we can get configuration.py and modeling.py , as unable to see in downloaded model file
how where we can get configuration.py and modeling.py , as unable to see in downloaded model file
@rohtashbeniwal555
In this repo.
https://huggingface.co/Alibaba-NLP/new-impl/blob/main/modeling.py
https://huggingface.co/Alibaba-NLP/new-impl/blob/main/configuration.py
I met some problems when I use new-impl
https://huggingface.co/Alibaba-NLP/gte-large-en-v1.5/discussions/17#66a3a719f33ff23e1c003ef1
need help~
any updates?
To summarize the above, here's a snippet to illustrate how to setup offline usage (but trust_remote_code
must still be True to load correctly according to the authors).
import json
from pathlib import Path
import torch.nn.functional as F
from transformers import AutoModel, AutoTokenizer
def save_pretrained_alibaba_model(
model_dir: Path,
model_name_or_path="Alibaba-NLP/gte-multilingual-base",
):
"""Save pretrained Alibaba model `model_name_or_path` to local directory `model_dir` for offline use.
Refer to: https://huggingface.co/Alibaba-NLP/new-impl/discussions/2#662b08d04d8c3d0a09c88fa3
NOTE: After it is downloaded, trust_remote_code=True is still required but will be offline.
"""
from huggingface_hub import hf_hub_download
pth_config = model_dir / "config.json"
model_name_or_path = "Alibaba-NLP/gte-multilingual-base"
# Download the tokenizer and model (internet required)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
model = AutoModel.from_pretrained(model_name_or_path, trust_remote_code=True)
model.save_pretrained(model_dir)
tokenizer.save_pretrained(model_dir)
# Overwrite config
cfg = json.loads(pth_config.read_text())
cfg["auto_map"] = {
"AutoConfig": "configuration.NewConfig",
"AutoModel": "modeling.NewModel",
"AutoModelForMaskedLM": "modeling.NewForMaskedLM",
"AutoModelForMultipleChoice": "modeling.NewForMultipleChoice",
"AutoModelForQuestionAnswering": "modeling.NewForQuestionAnswering",
"AutoModelForSequenceClassification": "modeling.NewForSequenceClassification",
"AutoModelForTokenClassification": "modeling.NewForTokenClassification",
}
pth_config.write_text(json.dumps(cfg))
# Download the relevant files
hf_hub_download(
repo_id="Alibaba-NLP/new-impl",
filename="modeling.py",
local_dir=model_dir.as_posix(),
)
hf_hub_download(
repo_id="Alibaba-NLP/new-impl",
filename="configuration.py",
local_dir=model_dir.as_posix(),
)
You can then do:
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True)
model.to(device)
input_texts = [
"what is the capital of China?",
"how to implement quick sort in python?",
"ๅไบฌ",
"ๅฟซๆ็ฎๆณไป็ป"
]
# Tokenize the input texts
batch_dict = tokenizer(input_texts, max_length=8192, padding=True, truncation=True, return_tensors="pt")
for key in batch_dict.keys():
batch_dict[key] = batch_dict[key].to(device)
outputs = model(**batch_dict)
dimension = 768 # The output dimension of the output embedding, should be in [128, 768]
embeddings = outputs.last_hidden_state[:, 0][:dimension]
embeddings = F.normalize(embeddings, p=2, dim=1)
scores = (embeddings[:1] @ embeddings[1:].T) * 100
print(scores.tolist())
# [[0.3016996383666992, 0.7503870129585266, 0.3203084468841553]]
I finally resolve it by cp the cache both "models--Alibaba-NLP--new-impl" and 'models--Alibaba-NLP--gte-large-en-v1.5' into the huggingface cache folder along with trust_remote_code=True, when network break it can still works
@izhx
@jiangxuehaokeai
@zhang-ke
@shern2
Sorry to bother you all, but I really have some problems. Sincerely ask you to help me.
I can't load gte-base-en-v1.5 through SentenceTransformer.
Prerequisite: I have modified config.json according to the above reply, and put the two python files into the gte-base-en-v1.5 model path.
The following code can run normally offline:
model_path = "/Users/aaa/Downloads/models/gte-base-en-v1.5"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
But when I use SentenceTransformer:
from sentence_transformers import SentenceTransformer
model_name = "/Users/aaa/Downloads/models/gte-base-en-v1.5"
model = SentenceTransformer(model_name, trust_remote_code=True)
It will report an error:
Traceback (most recent call last):
File "/Users/aaa/miniforge3/envs/multi-cpr/lib/python3.8/site-packages/sentence_transformers/util.py", line 1395, in load_dir_path
repo_path = snapshot_download(**download_kwargs)
File "/Users/aaa/miniforge3/envs/multi-cpr/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn
validate_repo_id(arg_value)
File "/Users/aaa/miniforge3/envs/multi-cpr/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 154, in validate_repo_id
raiseHFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/Users/aaa/Downloads/models/gte-base-en-v1.5'. Use `repo_type` argument if needed.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "_.py", line 160, in <module>
model = SentenceTransformer(model_name, trust_remote_code=True)
File "/Users/aaa/miniforge3/envs/multi-cpr/lib/python3.8/site-packages/sentence_transformers/SentenceTransformer.py", line 306, in __init__
modules, self.module_kwargs = self._load_sbert_model(
File "/Users/aaa/miniforge3/envs/multi-cpr/lib/python3.8/site-packages/sentence_transformers/SentenceTransformer.py", line 1730, in _load_sbert_model
module_path = load_dir_path(
File "/Users/aaa/miniforge3/envs/multi-cpr/lib/python3.8/site-packages/sentence_transformers/util.py", line 1399, in load_dir_path
repo_path = snapshot_download(**download_kwargs)
File "/Users/aaa/miniforge3/envs/multi-cpr/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 106, in _inner_fn
validate_repo_id(arg_value)
File "/Users/aaa/miniforge3/envs/multi-cpr/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 154, in validate_repo_id
raiseHFValidationError(
huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/Users/aaa/Downloads/models/gte-base-en-v1.5'. Use `repo_type` argument if needed.
My environment:
python: 3.8.20
transformers: 4.46.3
sentence-transformers: 3.2.1