perfect upload. stop conversation when finished.
This updated files works fine.
I use Meta-Llama-3-8B-Instruct.Q8_0.gguf and Meta-Llama-3-8B-Instruct.Q6_K.gguf and both perfectly stop conversation when finished.
Many thanks. :)
@0-hero Could you tell how did you make current GGUFs? They work well, and models stops their turn as they should - but when I tried to reproduce the conversion with convert-hf-to-gguf.py from current llama.cpp (b2709) that (in theory) supports Llama 3, I get GGUFs that just don't stop the generation. Did you change any of the tokenizer configuration files vs original Llama repo, and/or used any specific llama.cpp commit / PR?
I think the changes are already merged
@0-hero
I pulled the current configs from original repo, but GGUFs made with llama.cpp b2709 still didn't stop the generation. But I experimented a bit, and changed "eos_token": "<|end_of_text|>",
to "eos_token": "<|eot_id|>",
in tokenizer_config.json, and that finally made the generation stop in Ollama after model turn (like in those GGUFs of yours). But I am not sure if that's the best / proper way, or if it won't have some side-effects in other apps. Always something with tokenizer and/or the chat template, sigh...