Gemma3n not working on H20 with bfloat16 data type.
I tried running Gemma3n model on H20 card with bfloat16 data type and it throws floating point exception and fails. Similarly I tried float32 data type and it works.
The example I'm trying is the same one present in the sample notebook.
On the other hand if I use the same example from the model card, bfloat16 fails with floating point exception (same as earlier) but flaot32 fails with below error
Unsupported: call_method GetAttrVariable(UserDefinedObjectVariable(AttentionInterface), _global_mapping) __getitem__ (ConstantVariable(),) {}
I tried the same in H100 and it's working as expected.
Anyone else faced this issue?
Hi @NOWSHAD ,
Welcome to Google Gemma family of open source models, the above issues might be because of the compatibility of floating point operation on a particular hardware. Please find the following suggestion to avoid such kind of issues:
Update Everything: Ensure your NVIDIA drivers, CUDA toolkit, cuDNN, PyTorch, transformers, and bitsandbytes libraries are all at their absolute latest stable versions. This is the most common fix.
Check Compatibility Matrix: Verify if your specific driver and CUDA versions are officially recommended for PyTorch on Hopper GPUs.
How the model is loaded (from_pretrained arguments), model.generate() is called (e.g., do_sample=False vs. True, specific generation parameters).
Thanks.
I tried running Gemma3n model on H20 card with bfloat16 data type and it throws floating point exception and fails. Similarly I tried float32 data type and it works.
The example I'm trying is the same one present in the sample notebook.
On the other hand if I use the same example from the model card, bfloat16 fails with floating point exception (same as earlier) but flaot32 fails with below errorUnsupported: call_method GetAttrVariable(UserDefinedObjectVariable(AttentionInterface), _global_mapping) __getitem__ (ConstantVariable(),) {}
I tried the same in H100 and it's working as expected.
Anyone else faced this issue?
For core dump with H20, check this: https://github.com/vllm-project/vllm/issues/4392#issuecomment-2227935528
Thank you @BalakrishnaCh @CHNtentes for your inputs. Yes, after playing around with the generation config and updating the packages resolved the issue.