Transformers does not recognize `vibevoice` architecture

#32
by SadeghPouriyanZadeh - opened

I'm running this line in google colab:

from transformers import pipeline

pipe = pipeline("text-to-speech", model="microsoft/VibeVoice-1.5B")

with an updated transformers version to transformers-4.56.1 using:

!pip install --upgrade transformers

but i get this error:

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
   1308             try:
-> 1309                 config_class = CONFIG_MAPPING[config_dict["model_type"]]
   1310             except KeyError:

3 frames
KeyError: 'vibevoice'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
   1309                 config_class = CONFIG_MAPPING[config_dict["model_type"]]
   1310             except KeyError:
-> 1311                 raise ValueError(
   1312                     f"The checkpoint you are trying to load has model type `{config_dict['model_type']}` "
   1313                     "but Transformers does not recognize this architecture. This could be because of an "

ValueError: The checkpoint you are trying to load has model type `vibevoice` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`

what's the problem? and how to solve it?

thanks. so how can i use the model without using transformers until the merging is done?

you can already use the PR code, and can even be installed with pip

pip install git+https://github.com/huggingface/transformers.git@refs/pull/40546/head

thanks. i installed transformers-4.56.0.dev0 version using pip install git+https://github.com/huggingface/transformers.git@refs/pull/40546/head.
but again i'm facing the same error:

ValueError: The checkpoint you are trying to load has model type `vibevoice` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

i installed transformers-4.56.0.dev0 , model.generate() function met a error:
AttributeError: 'DynamicCache' object has no attribute 'key_cache'

i installed transformers-4.56.0.dev0 , model.generate() function met a error:
AttributeError: 'DynamicCache' object has no attribute 'key_cache'

I'm getting the same error. I tried emptying cache (with torch.cuda.empty_cache() ) and setting use_cache = False. It works for shorter generation sequences. For longer sequences, I had to modify line 518 of modeling_vibevoice_inference.py, to bypass updating the cache, with a try/except block. Something like this:

            try:
                for layer_idx, (k_cache, v_cache) in enumerate(zip(negative_model_kwargs['past_key_values'].key_cache, 
                                                                        negative_model_kwargs['past_key_values'].value_cache)):
                    # Process each non-diffusion sample
                    for sample_idx in diffusion_start_indices.tolist():
                        # Shift cache for this sample
                        k_cache[sample_idx, :, -1, :] = k_cache[sample_idx, :, 0, :].clone()
                        v_cache[sample_idx, :, -1, :] = v_cache[sample_idx, :, 0, :].clone()

            except Exception as e:
                print(negative_model_kwargs['past_key_values'])
                negative_model_kwargs['past_key_values'] = None

This seems to fix the problem so far, and for some bizzare reaason, the print line under the except block returns:

ynamicCache(layers=[DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer])

Hopefully someone can explain what is going on here....

i even tested it with transformers-4.56.2 using this new commit from yesterday:
pip install git+https://github.com/huggingface/transformers.git@cd74917

but again i'm getting the same error:

ValueError: The checkpoint you are trying to load has model type `vibevoice` but Transformers does not recognize this architecture.
This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

Somebody please help :)

ValueError: The checkpoint you are trying to load has model type vibevoice but Transformers does not recognize this architecture.
$ pip list|grep transformers
transformers 4.57.0

:-(

Sign up or log in to comment