Transformers does not recognize `vibevoice` architecture

#32

by SadeghPouriyanZadeh - opened Sep 6

Sep 6

I'm running this line in google colab:

from transformers import pipeline

pipe = pipeline("text-to-speech", model="microsoft/VibeVoice-1.5B")

with an updated transformers version to transformers-4.56.1 using:

!pip install --upgrade transformers

but i get this error:

KeyError                                  Traceback (most recent call last)
/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
   1308             try:
-> 1309                 config_class = CONFIG_MAPPING[config_dict["model_type"]]
   1310             except KeyError:

3 frames
KeyError: 'vibevoice'

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.12/dist-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
   1309                 config_class = CONFIG_MAPPING[config_dict["model_type"]]
   1310             except KeyError:
-> 1311                 raise ValueError(
   1312                     f"The checkpoint you are trying to load has model type `{config_dict['model_type']}` "
   1313                     "but Transformers does not recognize this architecture. This could be because of an "

ValueError: The checkpoint you are trying to load has model type `vibevoice` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`

what's the problem? and how to solve it?

theo77186

Sep 6

requires https://github.com/huggingface/transformers/pull/40546 to be merged first

SadeghPouriyanZadeh

Sep 6

thanks. so how can i use the model without using transformers until the merging is done?

theo77186

Sep 7

you can already use the PR code, and can even be installed with pip

pip install git+https://github.com/huggingface/transformers.git@refs/pull/40546/head

SadeghPouriyanZadeh

Sep 7

thanks. i installed transformers-4.56.0.dev0 version using pip install git+https://github.com/huggingface/transformers.git@refs/pull/40546/head.
but again i'm facing the same error:

ValueError: The checkpoint you are trying to load has model type `vibevoice` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

theo77186

Sep 7

see https://huggingface.co/microsoft/VibeVoice-1.5B/discussions/30#68ba11712f03525b46636e7e for inference code

dkx666

Sep 9

•

edited Sep 9

i installed transformers-4.56.0.dev0 , model.generate() function met a error:
AttributeError: 'DynamicCache' object has no attribute 'key_cache'

deathknight0

Sep 13

•

edited Sep 13

i installed transformers-4.56.0.dev0 , model.generate() function met a error:
AttributeError: 'DynamicCache' object has no attribute 'key_cache'

I'm getting the same error. I tried emptying cache (with torch.cuda.empty_cache() ) and setting use_cache = False. It works for shorter generation sequences. For longer sequences, I had to modify line 518 of modeling_vibevoice_inference.py, to bypass updating the cache, with a try/except block. Something like this:

            try:
                for layer_idx, (k_cache, v_cache) in enumerate(zip(negative_model_kwargs['past_key_values'].key_cache, 
                                                                        negative_model_kwargs['past_key_values'].value_cache)):
                    # Process each non-diffusion sample
                    for sample_idx in diffusion_start_indices.tolist():
                        # Shift cache for this sample
                        k_cache[sample_idx, :, -1, :] = k_cache[sample_idx, :, 0, :].clone()
                        v_cache[sample_idx, :, -1, :] = v_cache[sample_idx, :, 0, :].clone()

            except Exception as e:
                print(negative_model_kwargs['past_key_values'])
                negative_model_kwargs['past_key_values'] = None

This seems to fix the problem so far, and for some bizzare reaason, the print line under the except block returns:

ynamicCache(layers=[DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer, DynamicLayer])

Hopefully someone can explain what is going on here....

SadeghPouriyanZadeh

Sep 17

i even tested it with transformers-4.56.2 using this new commit from yesterday:
pip install git+https://github.com/huggingface/transformers.git@cd74917

but again i'm getting the same error:

ValueError: The checkpoint you are trying to load has model type `vibevoice` but Transformers does not recognize this architecture.
This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

Somebody please help :)

kevin82

13 days ago

ValueError: The checkpoint you are trying to load has model type vibevoice but Transformers does not recognize this architecture.
$ pip list|grep transformers
transformers 4.57.0

:-(

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment