Is the `microsoft/bitnet-b1.58-2B-4T` version missing a custom loader?
I'm trying to load the microsoft/bitnet-b1.58-2B-4T
via the Transformers library, as per the documentation:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "microsoft/bitnet-b1.58-2B-4T"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
trust_remote_code=True,
)
And I'm getting this error, which was reported by others already:
Could not locate the configuration_bitnet.py inside microsoft/bitnet-b1.58-2B-4T
Looking at the repository, there doesn't seem to be a custom loader (there are no python files). Shouldn't there be one?
More generally, could you clarify how exactly the values are packed? Inspecting the model.safetensors
file, I'm seeing weights being stored as uint8
, which makes me assume that maybe it's using TL2 format (3 weights packed into 5 bits), but there's no way to tell without seeing the custom loader.
Use this version of the transformers
pip install git+https://github.com/shumingma/transformers.git
These artifacts can be found in the 1bitllm/bitnet-1.58b-3b files but they do not work for this tokenizer since it uses llama3 not bitnet tokenizer.
I believe we are waiting for an update on the conversion code to convert it properly with llama3.
- attempt 1 - successfully converted but does not respond correctly
I have used convert-ms-to-gguf-bitnet.py to generate the gguf but the tokenizer is broken based on the inference response. "the... the... ,, the.. etc". - attempt 2 - successfully converted but does not respond correctly
I modified convert-hf-to-gguf-bitnet.py to use BPE/gpt2 which causes the inference response to be broken words with no token comprehension like repeating its system prompt then responding with 0 afterward. - attempt 3 - successfully converted but responds with empty spaces repeating infinitely
Using the .model file from the artifacts in 1bitllm which has 32k vocab does not work with the spm settings for either script even when padding is enabled.
I have been able to successfully fine tune this model with a sft trainer but getting it to gguf has been the challenge without a proper tokenizer.model or BPEvocab conversion for llama3.