Special tokens in output generation

#2
by Matthieu - opened

Hello,

Thanks for sharing this model!

When generating output, and even if "skip_special_tokens=True" there are two special tokens at beginning ( ) and ending (\n) of this output, in addition to special whitespace tokens.
Is there any way of removing them and use space token instead of special whitespace tokens?

Thanks a lot for trying the model! Can you try using T5Tokenizer instead of AutoTokenizer, and uses spaces_between_special_tokens=False when decoding?

Thanks for your feedback! I have applied all your recommendations but I still have at the end of output generation a newline character (\n).

Any idea?

Hi,
Can you take a screenshot of the problem(input, tokenized input, decoded etc) so that we can walk through it a bit? BTW, here is a question we got from the GitHub. It seems pretty similar: https://github.com/lm-sys/FastChat/issues/1022. Maybe you can also take a look?

Sign up or log in to comment