[Add] `tokenizer_class` in config to make it usable by the `pipeline` API

#10
by ariG23498 HF Staff - opened

Hey team!

Thanks for open sourcing this model.

I have added the tokenizer_class in the configuration so that we can make it compatible with the pipeline API as well. With the changes in place you would be able to use the model like so:

from transformers.pipelines import pipeline
import torch

messages = [
    {"role": "user", "content": "Who are you?"},
]

pipe = pipeline(
    "text-generation",
    model="XiaomiMiMo/MiMo-7B-RL",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
print(pipe(messages))
bwshen-mi changed pull request title from Adding tokenizer class for the `pipeline` API to use the tokenizer to [Add] `tokenizer_class` in config to make it usable by the `pipeline` API
bwshen-mi changed pull request status to merged
Xiaomi MiMo org

@ariG23498 Thank you for your efforts. I have a small question here.

In tokenizer_config.json, the value of tokenizer_class is a str "Qwen2Tokenizer", but here in config.json, it's a list[str]. Do we need to use same value type of tokenizer_class?

I don't think there is a need. But if there is any issue that you find do let me know and I will investigate further.

Sign up or log in to comment