taishi-i/nagisa_bert · Can't load tokenizer for 'taishi-i/nagisa

8 days ago

Hi,

In 2023, we trained a PyTorch BERT model with 'taishi-i/nagisa_bert' in AWS SageMaker Studio notebooks. We had used the model for creating AWS SageMaker inference endpoints without any troubles.

However, since July 1st 2025, we could not make it work with AWS SageMaker inference endpoints. Below is the error message:

Can't load tokenizer for 'taishi-i/nagisa_bert'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'taishi-i/nagisa_bert' is the correct path to a directory containing all relevant files for a NagisaBertTokenizer tokenizer.

I'm unsure how to resolve this issue, as no changes have been made to the code. Your advice and guidance on how to address it would be greatly appreciated.

Best Regards,
Linh

taishi-i

Owner 7 days ago

Hi Linh ,

Thank you for your comment, and I appreciate your continued use of the model.
Could you please provide the following details:

Transformers version – The versions used during both training and inference.
Environment – SageMaker instance type, Python version, etc.

With this information, I will look into the issue and work on a solution.
Thank you for your cooperation.

Best Regards,
Taishi Ikeda

linhttphan

6 days ago

•

edited 6 days ago

Hi Taishi Ikeda,

Thank you very much for your reply -- it's great to hear from you!!
Below are the details:

Transformers: transformers==4.6.1
Environment:
- SageMaker instance types: "ml.m4.2xlarge", "ml.m4.xlarge". We had also used "ml.c6i.xlarge" because it's cheaper.
- Python version: py39
Screenshots of training and testing in the notebook:
- Training:
- Testing:

Best Regards,
Linh

taishi-i

Owner 4 days ago

Hi Linh,

Thank you very much for the information.

I've identified the root cause of the issue — it appears to be related to the version of the transformers library. Versions later than 4.46.3 seem to trigger this error.

Would it be possible for you to specify the version of transformers as 4.46.3? You can do so by running the following command:

pip install transformers==4.46.3

Please let me know if this resolves the issue or if you encounter any further problems.

Best Regards,
Taishi Ikeda

linhttphan

4 days ago

Hi Taishi Ikeda,

Thank you very much for your guidance.

I ran the command to install transformers as 4.46.3 as advised. However, when I attempted to load the model for testing in my notebook—following the same steps I used previously—I encountered the following error:

I then checked CloudWatch and found this error:

Does this mean the model I trained earlier is no longer compatible and I’ll need to retrain everything from scratch using transformers==4.46.3 ?

I appreciate any insights you can share.

Best Regards,
Linh

taishi-i

Owner 3 days ago

•

edited 3 days ago

Hi Linh,

Thank you for sharing the detailed logs and error message.
First of all, I’d like to apologize that the previous suggestion didn’t resolve the issue. I appreciate your patience in working through this.

Does this mean the model I trained earlier is no longer compatible and I’ll need to retrain everything from scratch using transformers==4.46.3 ?

No, you don't need to retrain the model. It appears that the error you're encountering is:

ModuleNotFoundError: No module named 'nvgpu'

This occurs because TorchServe attempts to monitor GPU utilization using the nvgpu module, which is currently not installed in your container environment.
To address this, please install the nvgpu module as part of your environment setup or deployment process:

$ pip install nvgpu

If you need any help updating your deployment configuration or troubleshooting further, please don’t hesitate to reach out.

Best regards,
Taishi Ikeda

linhttphan

3 days ago

Hi Taishi Ikeda,

Thank you for being so supportive.

I followed your guidance and installed the necessary modules. I also added protobuf after noticing related errors in CloudWatch.

I then proceeded to create the AWS SageMaker inference endpoint using the same approach that worked in the past. The endpoint was successfully created, but it is no longer running predictions.

Upon checking CloudWatch, I found several warnings and errors. The protobuf error still persists even after installation and restarting the notebook’s kernel.

I’d really appreciate any help or suggestions you might have to troubleshoot this issue further.

Warm Regards,
Linh

taishi-i

Owner 2 days ago

Hi Linh,

Thank you for providing the detailed logs.
I’m sorry to trouble you again, but could you also let me know the version of the tokenizers library you're using?

For reference, I’m currently using:

tokenizers==0.20.3

While my setup isn't running on SageMaker (so a direct comparison may not be entirely accurate), I’ve confirmed that things work correctly in my local Ubuntu environment with the following package versions:

certifi==2025.6.15
charset-normalizer==3.4.2
Cython==3.1.2
dyNET38==2.2
filelock==3.18.0
fsspec==2025.5.1
hf-xet==1.1.5
huggingface-hub==0.33.2
idna==3.10
Jinja2==3.1.6
MarkupSafe==3.0.2
mpmath==1.3.0
nagisa==0.2.11
nagisa-bert==0.0.4
networkx==3.5
numpy==2.3.1
nvidia-cublas-cu12==12.6.4.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
nvidia-cufft-cu12==11.3.0.4
nvidia-cufile-cu12==1.11.1.6
nvidia-curand-cu12==10.3.7.77
nvidia-cusolver-cu12==11.7.1.2
nvidia-cusparse-cu12==12.5.4.2
nvidia-cusparselt-cu12==0.6.3
nvidia-nccl-cu12==2.26.2
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.6.77
packaging==25.0
protobuf==6.31.1
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.4
safetensors==0.5.3
six==1.17.0
sympy==1.14.0
tokenizers==0.20.3
torch==2.7.1
tqdm==4.67.1
transformers==4.46.3
triton==3.3.1
typing_extensions==4.14.1
urllib3==2.5.0

I’d like to try reproducing the same error on my end as well, so I’d appreciate a little time to investigate further.

Thank you again for your cooperation!

Best regards,
Taishi Ikeda

linhttphan

2 days ago

Hi Taishi Ikeda,

I'm grateful for your insight and wisdom.

I had not specified the versions of tokenizers for inference. During model training in 2023, I used the following libraries in the requirements.txt file:

tqdm
requests==2.28.2
regex
sentencepiece
sacremoses
transformers==4.6.1
nagisa-bert==0.0.3

After installing the package versions you recommended for inference, I encountered several errors on the AWS SageMaker notebook.

Despite the warnings, I proceeded to create the inference endpoint using my previous approach. Although the endpoint was created successfully, it isn’t able to perform predictions.

When I checked CloudWatch, I noticed that the warnings and errors were similar to those from yesterday.

I’ve attempted to resolve the installation issues by reinstalling the packages with versions compatible with Python 3.10. The installation completed without errors, but the inference issue remains.
Here’s the full list of packages I installed:

# Fixed version installations - compatible with Python 3.10
!pip install certifi
!pip install charset-normalizer
!pip install Cython
!pip install dyNET38==2.2
!pip install filelock
!pip install fsspec
!pip install hf-xet
!pip install huggingface-hub
!pip install idna
!pip install Jinja2
!pip install MarkupSafe
!pip install mpmath
!pip install nagisa==0.2.11
!pip install nagisa-bert==0.0.4
!pip install networkx==3.4.2  # Fixed: 3.5 doesn't exist, using latest available
!pip install numpy==1.26.4    # Fixed: 2.3.1 requires Python >=3.11, using compatible version
!pip install nvidia-cublas-cu12==12.6.4.1
!pip install nvidia-cuda-cupti-cu12==12.6.80
!pip install nvidia-cuda-nvrtc-cu12==12.6.77
!pip install nvidia-cuda-runtime-cu12==12.6.77
!pip install nvidia-cudnn-cu12==9.5.0.50  # Fixed: 9.5.1.17 doesn't exist, using latest available
!pip install nvidia-cufft-cu12==11.3.0.4
!pip install nvidia-cufile-cu12==1.11.1.6
!pip install nvidia-curand-cu12==10.3.7.77
!pip install nvidia-cusolver-cu12==11.7.1.2
!pip install nvidia-cusparse-cu12==12.5.4.2
!pip install nvidia-cusparselt-cu12==0.6.3
!pip install nvidia-nccl-cu12==2.26.2
!pip install nvidia-nvjitlink-cu12==12.6.85
!pip install nvidia-nvtx-cu12==12.6.77
!pip install packaging
!pip install protobuf
!pip install PyYAML
!pip install regex
!pip install requests
!pip install safetensors
!pip install six
!pip install sympy
!pip install tokenizers
!pip install torch==2.6.0      # Fixed: 2.7.1 doesn't exist, using latest available
!pip install tqdm
!pip install transformers
!pip install triton==3.2.0     # Fixed: 3.3.1 doesn't exist, using latest available
!pip install typing_extensions
!pip install urllib3

At this point, I’m unsure what steps to take next. I’d be grateful for any suggestions you may have to help troubleshoot this further.

Best Regards,
Linh