Questions about how to run
Are there special instructions on how to run this model? It gave less accurate answers when using chatml as opposed to the mistral [INST] [/INST] template, but in order to get correctly formatted responses I needed to use chatml.
Also, when I gave it prompts that were above a certain length, it would just respond with random numbers and letters. Switching from llama.cpp to llamacpp_HF fixed this, but the switch also decreased the response quality.
I read in another discussion that "Add the bos_token to the beginning of prompts" is supposed to be unchecked with this model, but I don't have that option when using llama.cpp with oobabooga. When using llamacpp_HF I have it, but it seemed the responses were actually better when it was checked...
I used https://huggingface.co/bartowski/Tess-3-Mistral-Large-2-123B-GGUF
Hey! Umm, not too sure about quants. I tested the model with the sample script provided and it worked fine.
Is there a way to run the model with Python, and loaded in 4-bit?
I'm kinda needing to run it in oobabooga at a certain quant for eval. I'm wondering if the model is still more comfortable with its original prompt template even though this finetune is meant for chatml it seems.