Questions about how to run

#2
by DontPlanToEnd - opened

Are there special instructions on how to run this model? It gave less accurate answers when using chatml as opposed to the mistral [INST] [/INST] template, but in order to get correctly formatted responses I needed to use chatml.

Also, when I gave it prompts that were above a certain length, it would just respond with random numbers and letters. Switching from llama.cpp to llamacpp_HF fixed this, but the switch also decreased the response quality.
I read in another discussion that "Add the bos_token to the beginning of prompts" is supposed to be unchecked with this model, but I don't have that option when using llama.cpp with oobabooga. When using llamacpp_HF I have it, but it seemed the responses were actually better when it was checked...

I used https://huggingface.co/bartowski/Tess-3-Mistral-Large-2-123B-GGUF

Hey! Umm, not too sure about quants. I tested the model with the sample script provided and it worked fine.

Is there a way to run the model with Python, and loaded in 4-bit?

I'm kinda needing to run it in oobabooga at a certain quant for eval. I'm wondering if the model is still more comfortable with its original prompt template even though this finetune is meant for chatml it seems.

Sign up or log in to comment