Fine tuning fails in "Llama4ForConditionalGeneration.from_pretrained" step with the following error and I'm using transformers==4.51.0 with latest version of bitsandbytes. Please suggest.
[rank0]: model = Llama4ForConditionalGeneration.from_pretrained(
[rank0]: File "/home/chakravn/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/chakravn/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4399, in from_pretrained
[rank0]: ) = cls._load_pretrained_model(
[rank0]: File "/home/chakravn/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4833, in _load_pretrained_model
[rank0]: disk_offload_index, cpu_offload_index = _load_state_dict_into_meta_model(
[rank0]: File "/home/chakravn/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[rank0]: return func(*args, **kwargs)
[rank0]: File "/home/chakravn/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 776, in load_state_dict_into_meta_model
[rank0]: shard_and_distribute_module(
[rank0]: File "/home/chakravn/.local/lib/python3.10/site-packages/transformers/integrations/tensor_parallel.py", line 652, in shard_and_distribute_module
[rank0]: param = tp_layer.partition_tensor(
[rank0]: File "/home/chakravn/.local/lib/python3.10/site-packages/transformers/integrations/tensor_parallel.py", line 440, in partition_tensor
[rank0]: parameter = get_packed_weights(param, empty_param, device_mesh, rank, -1)
[rank0]: File "/home/chakravn/.local/lib/python3.10/site-packages/transformers/integrations/tensor_parallel.py", line 124, in get_packed_weights
[rank0]: slice_dtype = slice.get_dtype()