How to reproduce?
Hi again, I'm interested in experimenting with the VNTL method; I'm fully convinced that causal LMs are the way to go for feed-forward contextual translation. Unfortunately, 27B is a bit much for my 12G 4070, and the 8B model (at Q8_0) keeps returning blank lines, so I want to try experimenting with training more QLoRAs. Would you be able to provide the code you are currently using as a starting point?
That’s odd... the 8B model shouldn’t return blank lines. How are you using it?
In any case, I’m more than happy to share my code! It’s great to see more people interested in pushing this forward.
You can find the code I used for the 8B model here: https://gist.github.com/lmg-anon/f616e9406587f633295fa48d41d4ecb5
The code for the 27B model is pretty much the same, without the hardcoded jp_token
, en_token
, human_token
and llm_token
variables and some other details, but I can't find it right now.
You can find the code I used for the 8B model here:
Thank you so much! Will experiment with this over the weekend.
That’s odd... the 8B model shouldn’t return blank lines. How are you using it?
I'm using the Q8_0 quant through llama.cpp's llama-cli
and llama-server
binaries, running on CUDA with full offloading. I have a program that's feeding the Japanese iteratively into the model with a sliding window approach that attempts to maximize context usage. Metadata is characters only and is determined by filtering a hardcoded database using all of the speakers currently in context as well as the next speaker and any names that are present in the current line (in theory, it should determine it from previous lines too, but I got lazy). When a translation is returned from the model, excess whitespace is trimmed before adding it to the context.
Here's the problematic prompt: https://gist.github.com/robbie01/790e2cf98e927f13a24b676cccd88f00
Note well that the prompt should end with a space, but not a newline. If sampled with temperature 0, it only yields a single additional whitespace. This is also true for fp16.
Wow, I’ve been doing it wrong this whole time. Now the reason why I have to trim in the first place makes sense; with the space at the end of the prompt, it always output 「
at the beginning.
If you don’t mind, what program are you using to view that?
That's mikupad: https://github.com/lmg-anon/mikupad, with the "Monospace Dark" theme.
Thanks! By the way, I adjusted my approach in light of this: the prompt includes everything up until the <<ENGLISH>>\n
, as that is all guaranteed to tokenize in a specific way, and the [Speaker]:
is forced through a GBNF grammar like root ::= "[Speaker]: " [^\n]+
to let the model output tokens as naturally as possible.