migtissera/Tess-3-Mistral-Large-2-123B · Questions about how to run

Aug 11

•

Are there special instructions on how to run this model? It gave less accurate answers when using chatml as opposed to the mistral [INST] [/INST] template, but in order to get correctly formatted responses I needed to use chatml.

Also, when I gave it prompts that were above a certain length, it would just respond with random numbers and letters. Switching from llama.cpp to llamacpp_HF fixed this, but the switch also decreased the response quality.
I read in another discussion that "Add the bos_token to the beginning of prompts" is supposed to be unchecked with this model, but I don't have that option when using llama.cpp with oobabooga. When using llamacpp_HF I have it, but it seemed the responses were actually better when it was checked...

I used https://huggingface.co/bartowski/Tess-3-Mistral-Large-2-123B-GGUF

migtissera

Owner Aug 11

Hey! Umm, not too sure about quants. I tested the model with the sample script provided and it worked fine.

Is there a way to run the model with Python, and loaded in 4-bit?

DontPlanToEnd

Aug 11

•

edited Aug 11

I'm kinda needing to run it in oobabooga at a certain quant for eval. I'm wondering if the model is still more comfortable with its original prompt template even though this finetune is meant for chatml it seems.