codecho commited on
Commit
b2322d7
·
verified ·
1 Parent(s): a574f67

Fix incorrect unk_id assignment

Browse files

`self.unk_id` is currently incorrectly assigned the ID of pad_token. Inferring from the original comment, the intention was likely to use `self.unk_token` (the second-to-last special token) to fetch its correct ID; this PR applies this correction.
However, I think this fix might affect model performance without retraining.

Files changed (1) hide show
  1. tokenization_kimia.py +1 -1
tokenization_kimia.py CHANGED
@@ -127,7 +127,7 @@ class TikTokenTokenizer(PreTrainedTokenizer):
127
  self.pad_id: int = self.special_tokens[self.pad_token]
128
 
129
  self.unk_token: str = special_tokens[-2]
130
- self.unk_id: int = self.special_tokens[self.pad_token]
131
 
132
  self.stop_tokens = {
133
  self.special_tokens["[EOS]"],
 
127
  self.pad_id: int = self.special_tokens[self.pad_token]
128
 
129
  self.unk_token: str = special_tokens[-2]
130
+ self.unk_id: int = self.special_tokens[self.unk_token]
131
 
132
  self.stop_tokens = {
133
  self.special_tokens["[EOS]"],