Error while loading through the code at huggingface
I loaded it through the use in transformer code and got the following error.
from transformers import AutoProcessor, AutoModelForPreTraining
processor = AutoProcessor.from_pretrained("KBLab/wav2vec2-large-voxrex")
model = AutoModelForPreTraining.from_pretrained("KBLab/wav2vec2-large-voxrex")
Maybe you would like to add information to the repo or change the default code to work directly from Use in Transformers?
OSError: Can't load tokenizer for 'KBLab/wav2vec2-large-voxrex'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'KBLab/wav2vec2-large-voxrex' is the correct path to a directory containing all relevant files for a Wav2Vec2CTCTokenizer tokenizer.
wav2vec2-large-voxrex
is a different repo which does not have a vocabulary (vocab.json) nor a tokenizer config file (tokenizer_config.json). You'd need to clone that repo with git and add your own vocabulary manually.
This repo is wav2vec2-large-voxrex-swedish
. You can load it for continued pretraining using existing vocab (edit: continued pretraining doesn't need a vocab, see comment below this one):
from transformers import AutoProcessor, AutoModelForPreTraining
processor = AutoProcessor.from_pretrained("KBLab/wav2vec2-large-voxrex-swedish")
model = AutoModelForPreTraining.from_pretrained("KBLab/wav2vec2-large-voxrex-swedish")
or with CTC
from transformers import AutoProcessor, AutoModelForCTC
processor = AutoProcessor.from_pretrained("KBLab/wav2vec2-large-voxrex-swedish")
model = AutoModelForCTC.from_pretrained("KBLab/wav2vec2-large-voxrex-swedish")
See links below for differences in files they include
https://huggingface.co/KBLab/wav2vec2-large-voxrex/tree/main
https://huggingface.co/KBLab/wav2vec2-large-voxrex-swedish/tree/main
@marma Is it necessary to have vocab during unsupervised pretraining?
If you want to continue to pretrain, you may not need vocab: https://huggingface.co/docs/transformers/v4.23.1/en/model_doc/wav2vec2#transformers.Wav2Vec2ForPreTraining . There is no supervised data to feed the model in such a scenario.
Pretraining setup (no processor
):
import torch
from transformers import AutoFeatureExtractor, Wav2Vec2ForPreTraining
feature_extractor = AutoFeatureExtractor.from_pretrained("KBLab/wav2vec2-large-voxrex")
model = Wav2Vec2ForPreTraining.from_pretrained("KBLab/wav2vec2-large-voxrex")
Otherwise, if you need to finetune the model yourself
@birgermoell
, my suggestion would be to git clone https://huggingface.co/KBLab/wav2vec2-large-voxrex
, and add all tokenizer related files from https://huggingface.co/KBLab/wav2vec2-large-voxrex-swedish/tree/main
to your cloned folder. Then load the model locally on your computer.
See: https://huggingface.co/blog/fine-tune-wav2vec2-english for example on creating vocab from scratch, and for finetuning. However, you should just be able to copy over tokenizer related files from KBLab/wav2vec2-large-voxrex-swedish
to your cloned folder if your purpose is to finetune.
@Lauler You are right, unsupervised pretraining does not need a vocab. The vocab is derived from the speech-text pairs used in finetuning.
My hope was to get the embeddings out from the base model in order to use it for a classification task (which is not dependent on transcription and that is why I don't want to use the embeddings from the fine-tuned models). I'm honestly a bit confused between the difference between https://huggingface.co/KBLab/wav2vec2-large-voxrex and https://huggingface.co/KBLab/wav2vec2-large-voxrex-swedish
@birgermoell
The only difference is that VoxRex-swedish is a Wav2Vec2ForCTC, i.e it has a CTC head on top of the pretrained model that has been fintuned for Swedish. My guess is that you want pooled_output
or something similar. Maybe this[1] already does that?