Finetuning this model

#7
by Andrefty - opened

Hi!

Is there currently any way to finetune this model (using SFT or similar)?

I tried looking into using the nanotron PR from the blog post, but after training a model using the bitnet config in the examples folder i am unable to run it using run_generate due to errors, and unable to convert it using the llama nanotron_to_hf script, due to it apparently not supporting bitnet

Hugging Face 1Bit LLMs org

Hey @Andrefty !

Yeah I think you can finetune the models if you convert them to nanotron format. Before that you need to unpack the weights while the model is still in hf format, using this function

def unpack_weights(packed: torch.Tensor, bits: int = 2) -> torch.Tensor:
    values_per_item = 8 // bits
    packed_shape = packed.shape

    if len(packed_shape) == 1:
        original_row_dim = packed_shape[0] * values_per_item
        unpacked_shape = (original_row_dim,)
    else:
        original_row_dim = packed_shape[0] * values_per_item
        unpacked_shape = (original_row_dim, *packed_shape[1:])

    unpacked = torch.zeros(unpacked_shape, device=packed.device, dtype=torch.uint8)

    for i in range(values_per_item):
        start = i * packed_shape[0]
        end = start + packed_shape[0]
        mask = (3 << (2 * i))
        unpacked[start:end] = (packed & mask) >> (2 * i)

    unpacked = unpacked.to(torch.float) - 1
    return unpacked

After that you need to convert the model to nanotron format using hf_to_nanotron script (make sure to use LlamaBitNetConfig instead of LlamaConfig, delete is_llama_config=True, and add is_bitnet_config=True in the config) :

.....
nanotron_llama_config = LlamaBitNetConfigNanotron(
        bos_token_id=hf_config.bos_token_id,
        eos_token_id=hf_config.eos_token_id,
        hidden_act=hf_config.hidden_act,
        hidden_size=hf_config.hidden_size,
        initializer_range=hf_config.initializer_range,
        intermediate_size=hf_config.intermediate_size,
        is_bitnet_config=True,
        max_position_embeddings=hf_config.max_position_embeddings,
        num_attention_heads=hf_config.num_attention_heads,
        num_hidden_layers=hf_config.num_hidden_layers,
        num_key_value_heads=hf_config.num_key_value_heads,
        pad_token_id=None,
        pretraining_tp=hf_config.pretraining_tp,
        rms_norm_eps=hf_config.rms_norm_eps,
        rope_scaling=hf_config.rope_scaling,
        rope_theta=hf_config.rope_theta,
        rope_interleaved=False,
        tie_word_embeddings=hf_config.tie_word_embeddings,
        use_cache=hf_config.use_cache,
        vocab_size=hf_config.vocab_size,
        # is_llama_config=True
    )
......

Once it's converted you can launch the finetuning.
If it helps I can upload the models before weight packing so you can just convert them directly to nanotron.
To convert the model from nanotron to hf using nanotron_to_hf script, it should actually work out of the box, can you share the error you are encountering so I can help you ?

Hi! Thanks for the response and sorry for replying so late :)

I'll start with the environment setup; I am using pull request 180 for installing/building nanotron and the main branch of transformers (since the BitNet quant PR has been merged), and because the convert scripts for llama (convert_hf_to_nanotron.py and convert_nanotron_to_hf.py) don't seem to feature in PR 180 of nanotron, I just use a copy of the examples/llama directory from the main nanotron branch.

I unpack the weights and save them back to the original safetensors file using this script, which includes the unpack_weights function you mentioned:

import torch
from transformers import AutoModelForCausalLM
from safetensors.torch import safe_open, save_file

def unpack_weights(packed: torch.Tensor, bits: int = 2) -> torch.Tensor:
    values_per_item = 8 // bits
    packed_shape = packed.shape

    if len(packed_shape) == 1:
        original_row_dim = packed_shape[0] * values_per_item
        unpacked_shape = (original_row_dim,)
    else:
        original_row_dim = packed_shape[0] * values_per_item
        unpacked_shape = (original_row_dim, *packed_shape[1:])

    unpacked = torch.zeros(unpacked_shape, device=packed.device, dtype=torch.uint8)

    for i in range(values_per_item):
        start = i * packed_shape[0]
        end = start + packed_shape[0]
        mask = (3 << (2 * i))
        unpacked[start:end] = (packed & mask) >> (2 * i)

    unpacked = unpacked.to(torch.float) - 1
    return unpacked

model = AutoModelForCausalLM.from_pretrained("HF1BitLLM/Llama3-8B-1.58-100B-tokens", device_map="cuda", torch_dtype=torch.bfloat16)
model_path = model.config._name_or_path

safetensors_file = "<path_to_model_safetensor>"

# Open the safetensors file
tensors = {}
with safe_open(safetensors_file, framework="pt", device="cpu") as f:
    for key in f.keys():
        tensor = f.get_tensor(key)
        if tensor.dtype == torch.uint8:  # Assuming packed weights are in uint8
            tensor = unpack_weights(tensor)
        tensors[key] = tensor

# Save the modified tensors back to the safetensors file
save_file(tensors, safetensors_file)

print(f"Weights have been unpacked in the file: {safetensors_file}")

Then I modify convert_hf_to_nanotron.py by replacing line 15 with the Bitnet equivalent class and then replace all other occurrences of NanotronLlamaConfig (just in this file):

from nanotron.config import LlamaBitNetConfig  as LlamaBitNetConfigNanotron

I'm not sure where I can find the nanotron_llama_config you mentioned, but I checked the config of the model and 'is_llama_config' isn't present, while 'is_bitnet_config' is set to True.

Below is the error I get when trying to run convert_hf_to_nanotron.py:

(nanotronenv) (base) raven_huels_1729066988163_sandbo@l4150gvm:~/nanotron$ torchrun --nproc_per_node=1 examples/llama/convert_hf_to_nanotron.py --checkpoint_path=/home/raven_huels_1729066988163_sandbo/.cache/huggingface/hub/models--HF1BitLLM--Llama3-8B-1.58-100B-tokens/snapshots/5c35ae1f2c622b75a9c28e3603074863d74e4792 --save_path=nanotron_weights
You have loaded a BitNet model on CPU and have a CUDA device available, make sure to set your model on a GPU device in order to run your model.
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Traceback (most recent call last):
  File "/home/raven_huels_1729066988163_sandbo/nanotron/examples/llama/convert_hf_to_nanotron.py", line 119, in <module>
    convert_checkpoint_and_save(checkpoint_path=args.checkpoint_path, save_path=args.save_path)
  File "/home/raven_huels_1729066988163_sandbo/nanotron/examples/llama/convert_hf_to_nanotron.py", line 95, in convert_checkpoint_and_save
    hf_model = LlamaForCausalLM.from_pretrained(checkpoint_path)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/envs/nanotronenv/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3964, in from_pretrained
    if metadata.get("format") == "pt":
       ^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'get'

The errors I was talking about in my first message (when trying to convert or run the small template BitNet model trained with the configs inside PR 180) look like this, but this time without me modifying convert_nanotron_to_hf.py or run_generate.py:

torchrun --nproc_per_node=1 examples/llama/convert_nanotron_to_hf.py --checkpoint_path=checkpoints/10/ --save_path=hf-path
Traceback (most recent call last):
  File "/home/rosendo_schaefer_1728292797457_s/nanotron/examples/llama/convert_nanotron_to_hf.py", line 148, in <module>
    convert_checkpoint_and_save(
  File "/home/rosendo_schaefer_1728292797457_s/nanotron/examples/llama/convert_nanotron_to_hf.py", line 108, in convert_checkpoint_and_save
    model_config = NanotronLlamaConfig(**attrs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: LlamaConfig.__init__() got an unexpected keyword argument 'is_bitnet_config'
(nanotronenv) rosendo_schaefer_1728292797457_s@l4150gvm:~/nanotron$ torchrun --nproc_per_node=1 run_generate.py --ckpt-path checkpoints/10/ --t
p 1 --pp 1
10/07/2024 12:58:57 [INFO|DP=0|PP=0|TP=0]: model_config: LlamaBitNetConfig(bos_token_id=1, eos_token_id=2, hidden_act='silu', hidden_size=16, initializer_range=0.02, intermediate_size=64, is_bitnet_config=True, max_position_embeddings=256, num_attention_heads=4, num_hidden_layers=2, num_key_value_heads=4, pad_token_id=None, pretraining_tp=1, rms_norm_eps=1e-05, rope_scaling=None, tie_word_embeddings=True, use_cache=True, vocab_size=256)
10/07/2024 12:58:57 [INFO|DP=0|PP=0|TP=0]: tokenizer_path: robot-test/dummy-tokenizer-wordlevel
10/07/2024 12:58:57 [INFO|DP=0|PP=0|TP=0]: Building model..
10/07/2024 12:58:57 [INFO|DP=0|PP=0|TP=0]: Setting PP block ranks...
10/07/2024 12:58:57 [INFO|DP=0|PP=0|TP=0]: Loading checkpoint from checkpoints/10:
Loading weights: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:00<00:00, 2704.10it/s]
[rank0]: Traceback (most recent call last):
[rank0]:   File "/home/rosendo_schaefer_1728292797457_s/nanotron/run_generate.py", line 251, in <module>
[rank0]:     main()
[rank0]:   File "/home/rosendo_schaefer_1728292797457_s/nanotron/run_generate.py", line 187, in main
[rank0]:     for output in outputs:
[rank0]:   File "/opt/conda/envs/nanotronenv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 36, in generator_context
[rank0]:     response = gen.send(None)
[rank0]:                ^^^^^^^^^^^^^^
[rank0]:   File "/home/rosendo_schaefer_1728292797457_s/nanotron/src/nanotron/generation/decode.py", line 257, in decode_text
[rank0]:     sharded_logits = model(
[rank0]:                      ^^^^^^
[rank0]:   File "/opt/conda/envs/nanotronenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/nanotronenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/rosendo_schaefer_1728292797457_s/nanotron/src/nanotron/models/llama_bitnet.py", line 764, in forward
[rank0]:     return self.forward_with_hidden_states(input_ids=input_ids, input_mask=input_mask)[0]
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/rosendo_schaefer_1728292797457_s/nanotron/src/nanotron/models/llama_bitnet.py", line 780, in forward_with_hidden_states
[rank0]:     hidden_encoder_states = encoder_block(**hidden_encoder_states)
[rank0]:                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/nanotronenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/nanotronenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/rosendo_schaefer_1728292797457_s/nanotron/src/nanotron/parallel/pipeline_parallel/block.py", line 151, in forward
[rank0]:     output = self.pp_block(**new_kwargs)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/nanotronenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/nanotronenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/rosendo_schaefer_1728292797457_s/nanotron/src/nanotron/models/llama_bitnet.py", line 630, in forward
[rank0]:     output = self.attn(hidden_states=hidden_states, sequence_mask=sequence_mask)
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/nanotronenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/nanotronenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/rosendo_schaefer_1728292797457_s/nanotron/src/nanotron/models/llama_bitnet.py", line 359, in forward
[rank0]:     qkv_states = self.qkv_proj(
[rank0]:                  ^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/nanotronenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[rank0]:     return self._call_impl(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/conda/envs/nanotronenv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[rank0]:     return forward_call(*args, **kwargs)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/home/rosendo_schaefer_1728292797457_s/nanotron/src/nanotron/parallel/tensor_parallel/nn.py", line 406, in forward
[rank0]:     w_scale = getattr(self, "weight_scale").data
[rank0]:               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: AttributeError: 'NoneType' object has no attribute 'data'
Hugging Face 1Bit LLMs org
β€’
edited 23 days ago

For run_generate to work you just need to quantize your model before running it by computing the weight scales, or as a workaround you can edit the forward in TensorParallelColumnLinearBitNet and TensorParallelRowLinearBitNet in the following way :

if not self.training : 
            w = self.weight  # a weight tensor with shape [d, k]
            w = w.to(torch.bfloat16)
            w_quant = weight_quant(w)
            x_norm = normalize(x, self.in_features)
            x_quant = activation_quant(x_norm)
            return column_linear(
                input=x_quant,
                weight=w_quant,
                bias=self.bias,
                group=self.pg,
                tp_mode=self.mode,
                async_communication=self.async_communication,
            ) 

and do the same for the row forward and it should work.
For the conversion can you share what script you are using to convert ? is it this one here https://github.com/huggingface/nanotron/pull/174/files

I have the same question as well. I’m eagerly looking forward to a good solution from you! Thank you very much. Also, how can I change the model's tokenizer? I’d like to add some tokens, but it seems that transformers isn’t working

Hi @medmekk !

The error I posted before about running convert_hf_to_nanotron.py was occuring with the script from the examples/llama directory of the main nanotron branch.

Since then, I tried using the convert_hf_to_nanotron.py script from the PR you mentioned, and modified them to include the weight unpacking function, which is called only for the layer types that I could see are uint8: convert_hf_to_nanotron.py at main Β· Andrefty/nanotron

However, I am now getting assertion errors for these 2 layer types:

10/31/2024 16:19:31 [INFO|DP=0|PP=0|TP=0]: Layer 0 O Proj HF shape: torch.Size([1024, 4096])
10/31/2024 16:19:31 [INFO|DP=0|PP=0|TP=0]: Layer 0 O Proj Nanotron shape: torch.Size([4096, 4096])
Copying Hidden Layers: 0%
[rank0]: Traceback (most recent call last):
[rank0]:   File "nanotron/tools/llama3/convert_hf_to_nanotron.py", line 303, in <module>
[rank0]:     main(_args)
[rank0]:   File "nanotron/tools/llama3/convert_hf_to_nanotron.py", line 200, in main
[rank0]:     hf_model.model.layers[i].self_attn.o_proj.weight.shape
[rank0]: AssertionError
10/31/2024 17:02:18 [INFO|DP=0|PP=0|TP=0]: Layer 0 Down Proj HF shape: torch.Size([1024, 14336])
10/31/2024 17:02:18 [INFO|DP=0|PP=0|TP=0]: Layer 0 Down Proj Nanotron shape: torch.Size([4096, 14336])
Copying Hidden Layers: 0%
[rank0]: Traceback (most recent call last):
[rank0]:   File "nanotron/tools/llama3/convert_hf_to_nanotron.py", line 303, in <module>
[rank0]:     main(_args)
[rank0]:   File "nanotron/tools/llama3/convert_hf_to_nanotron.py", line 226, in main
[rank0]:     hf_model.model.layers[i].mlp.down_proj.weight.shape
[rank0]: AssertionError

Also in my fork of nanotron, I have merged the main branch of the repository with the one you mentioned above (that contains the llama3 convert scripts, in hopes of reducing compatibility issues) and your mohamed_llamabitnet branch.

I modified TensorParallelColumnLinearBitNet and TensorParallelRowLinearBitNet in nn.py to be able to use run_generate.py with the example Bitnet model I obtained after running training using the nanotron/examples/config_tiny_llama.yaml config, and it seems to not encounter errors anymore:

torchrun --nproc_per_node='gpu' run_generate.py --ckpt-path checkpoints/15/

10/31/2024 17:44:18 [INFO|DP=0|PP=0|TP=0]: model_config: LlamaBitNetConfig(bos_token_id=1, eos_token_id=2, hidden_act='silu', hidden_size=16, initializer_range=0.02, intermediate_size=64, is_bitnet_config=True, max_position_embeddings=256, num_attention_heads=4, num_hidden_layers=2, num_key_value_heads=4, pad_token_id=None, pretraining_tp=1, rms_norm_eps=1e-05, rope_scaling=None, tie_word_embeddings=True, use_cache=True, vocab_size=256)
10/31/2024 17:44:18 [INFO|DP=0|PP=0|TP=0]: tokenizer_path: robot-test/dummy-tokenizer-wordlevel
10/31/2024 17:44:18 [INFO|DP=0|PP=0|TP=0]: Building model..
10/31/2024 17:44:18 [INFO|DP=0|PP=0|TP=0]: Setting PP block ranks...
10/31/2024 17:44:18 [INFO|DP=0|PP=0|TP=0]: Loading checkpoint from checkpoints/15:
Loading weights: 100%
Initializing rotary embeddings with end=256
Initializing rotary embeddings with end=256
10/31/2024 17:44:22 [INFO|DP=0|PP=0|TP=0]: input: [CLS] the [UNK] [UNK] [UNK] is [SEP]
10/31/2024 17:44:22 [INFO|DP=0|PP=0|TP=0]: generation: this this this this this this this this this this this this this this this this
10/31/2024 17:44:22 [INFO|DP=0|PP=0|TP=0]: --------------------------------------------------
10/31/2024 17:44:22 [INFO|DP=0|PP=0|TP=0]: input: [CLS] [UNK] [UNK] ( [UNK] [UNK] [SEP]
10/31/2024 17:44:22 [INFO|DP=0|PP=0|TP=0]: generation: this this this this this this this this this this this this
10/31/2024 17:44:22 [INFO|DP=0|PP=0|TP=0]: --------------------------------------------------
Andrefty changed discussion status to closed
Andrefty changed discussion status to open

If it helps I can upload the models before weight packing so you can just convert them directly to nanotron.

I would greatly appreciate it if you could do this, as it might be my limited experience preventing me from properly unpacking the weights

Sign up or log in to comment