[Add] `tokenizer_class` in config to make it usable by the `pipeline` API

#10

by ariG23498 HF Staff - opened May 12

base: refs/heads/main

←

from: refs/pr/10

Discussion Files changed

-0

ariG23498

May 12

•

edited May 12

Hey team!

Thanks for open sourcing this model.

I have added the tokenizer_class in the configuration so that we can make it compatible with the pipeline API as well. With the changes in place you would be able to use the model like so:

from transformers.pipelines import pipeline
import torch

messages = [
    {"role": "user", "content": "Who are you?"},
]

pipe = pipeline(
    "text-generation",
    model="XiaomiMiMo/MiMo-7B-RL",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
print(pipe(messages))

Adding tokenizer class for the `pipeline` API to use the tokenizer0053e72c

bwshen-mi changed pull request title from Adding tokenizer class for the `pipeline` API to use the tokenizer to [Add] `tokenizer_class` in config to make it usable by the `pipeline` API May 13

bwshen-mi changed pull request status to merged May 13

bwshen-mi

Xiaomi MiMo org May 13

@ariG23498 Thank you for your efforts. I have a small question here.

In tokenizer_config.json, the value of tokenizer_class is a str "Qwen2Tokenizer", but here in config.json, it's a list[str]. Do we need to use same value type of tokenizer_class?

ariG23498

May 13

I don't think there is a need. But if there is any issue that you find do let me know and I will investigate further.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment