Gemma3n not working on H20 with bfloat16 data type.

#30

by NOWSHAD - opened Jul 16

Jul 16

•

I tried running Gemma3n model on H20 card with bfloat16 data type and it throws floating point exception and fails. Similarly I tried float32 data type and it works.
The example I'm trying is the same one present in the sample notebook.
On the other hand if I use the same example from the model card, bfloat16 fails with floating point exception (same as earlier) but flaot32 fails with below error

Unsupported: call_method GetAttrVariable(UserDefinedObjectVariable(AttentionInterface), _global_mapping) __getitem__ (ConstantVariable(),) {}

I tried the same in H100 and it's working as expected.

Anyone else faced this issue?

BalakrishnaCh

Google org 29 days ago

Hi @NOWSHAD ,

Welcome to Google Gemma family of open source models, the above issues might be because of the compatibility of floating point operation on a particular hardware. Please find the following suggestion to avoid such kind of issues:

Update Everything: Ensure your NVIDIA drivers, CUDA toolkit, cuDNN, PyTorch, transformers, and bitsandbytes libraries are all at their absolute latest stable versions. This is the most common fix.
Check Compatibility Matrix: Verify if your specific driver and CUDA versions are officially recommended for PyTorch on Hopper GPUs.

How the model is loaded (from_pretrained arguments), model.generate() is called (e.g., do_sample=False vs. True, specific generation parameters).

Thanks.

CHNtentes

28 days ago

I tried running Gemma3n model on H20 card with bfloat16 data type and it throws floating point exception and fails. Similarly I tried float32 data type and it works.
The example I'm trying is the same one present in the sample notebook.
On the other hand if I use the same example from the model card, bfloat16 fails with floating point exception (same as earlier) but flaot32 fails with below error
Unsupported: call_method GetAttrVariable(UserDefinedObjectVariable(AttentionInterface), _global_mapping) __getitem__ (ConstantVariable(),) {}
I tried the same in H100 and it's working as expected.

Anyone else faced this issue?

For core dump with H20, check this: https://github.com/vllm-project/vllm/issues/4392#issuecomment-2227935528

NOWSHAD

28 days ago

Thank you @BalakrishnaCh @CHNtentes for your inputs. Yes, after playing around with the generation config and updating the packages resolved the issue.

NOWSHAD changed discussion status to closed 28 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment