Using EuroBERT for classification tasks
What is the expected setup for running EuroBERT for classification tasks?
My assumption (and please correct me if I'm wrong), is that one is expected to add a classification head on top of the encoder, and to fine-tune on our specific classification dataset.
The way I have always done this in Transformers is with AutoModelForTokenClassification. However, if I try to do this with EuroBERT and transformers ==4.49.0, I get the following error
ValueError: Unrecognized configuration class <class 'transformers_modules.EuroBERT.EuroBERT-210m.e254226adddf6f93338ac9f034e949370de05f94.configuration_eurobert.EuroBertConfig'> for this kind of AutoModel: AutoModelForTokenClassification.
Model type should be one of AlbertConfig, BertConfig, BigBirdConfig, BioGptConfig, BloomConfig, BrosConfig, CamembertConfig, CanineConfig, ConvBertConfig, Data2VecTextConfig, DebertaConfig, DebertaV2Config, DiffLlamaConfig, DistilBertConfig, ElectraConfig, ErnieConfig, ErnieMConfig, EsmConfig, FalconConfig, FlaubertConfig, FNetConfig, FunnelConfig, GemmaConfig, Gemma2Config, GlmConfig, GPT2Config, GPT2Config, GPTBigCodeConfig, GPTNeoConfig, GPTNeoXConfig, HeliumConfig, IBertConfig, LayoutLMConfig, LayoutLMv2Config, LayoutLMv3Config, LiltConfig, LlamaConfig, LongformerConfig, LukeConfig, MarkupLMConfig, MegaConfig, MegatronBertConfig, MistralConfig, MixtralConfig, MobileBertConfig, ModernBertConfig, MPNetConfig, MptConfig, MraConfig, MT5Config, NemotronConfig, NezhaConfig, NystromformerConfig, PersimmonConfig, PhiConfig, Phi3Config, QDQBertConfig, Qwen2Config, Qwen2MoeConfig, RemBertConfig, RobertaConfig, RobertaPreLayerNormConfig, RoCBertConfig, RoFormerConfig, SqueezeBertConfig, StableLmConfig, Starcoder2Config, T5Config, UMT5Config, XLMConfig, XLMRobertaConfig, XLMRobertaXLConfig, XLNetConfig, XmodConfig, YosoConfig.
So what would be the right way? Is there any information explaining this that I have missed?
You might need to copy over the ForTokenClassification class from another model (bert for instance) and add it to the modeling file.
It's just a matter of removing pooling compared to the sequenceclassification head :)
It looks like the author introduced a new bug with the new update to Huggingface. (Updates was added 3.5 hours prior to this post)
the recent update to the Hugging
Could not locate the configuration_eurobert.py inside
file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "python3.11/site-packages/transformers/utils/hub.py", line 517, in cached_files
raise EnvironmentError(
OSError: EuroBERT/EuroBERT-210m does not appear to have a file named ..processing_utils.py. Checkout 'https://huggingface.co/EuroBERT/EuroBERT-210m/tree/main'for available files.
Hey, you’re absolutely right, I fixed the import. Sorry about that, and thanks for bringing it up 🙌
Hey @jiranzo ! Token classification is now live 😁. We rolled it out this morning—enjoy! 🙌