Tokens
#1
by
Recognizeme
- opened
Can you tell me if you are still developing the model?
Are you looking to increase the number of tokens?
I am not currently still developing the model but it would be pretty straightforward to train it on more tokens! See: https://github.com/JohnGiorgi/DeCLUTR. Based on the results in the paper I would expect increasing the training set to have a large positive effect on performance.
It's a real shame. Your model is one of the best for getting embeddings in scientific texts!