NousResearch/Nous-Hermes-Llama2-13b-GGML · Most of the time, this model doesn't work for me. Please help!

Jul 24, 2023

•

edited Jul 24, 2023

Hello,

I have issues using this model in inference UI apps like Faraday and LM Studio. Most of the time the model just doesn't work. Initially it looks promising, it loads into the RAM and VRAM, CPU starts processing, but then after a moment it all stops and the model unloads from both RAM and VRAM and nothing is generated. Yesterday I was able to briefly load and even use the model in LM Studio for a while, but after some time it stopped working again and kept giving me errors about failing model.

I've read there were some problems with the quantization process of the model as discussed here: https://huggingface.co/NousResearch/Nous-Hermes-Llama2-13b/discussions/1 I don't know if my problem has anything to do with that, but I'd really like to have a working version of this model.

I discussed this issue with another user who tested the model for me on his own hardware, although he has different specs (Intel + Nvidia, my own specs are at the bottom of the post) and he was able to use the model. I'm starting to feel like I'm the only one having this issue and it feels a bit ridiculous, because even though I normally use all kinds of different 13B GGML models, I just can't get this one to work for some reason. Is there anything I could do to fix the problem, please? Any help would be appreciated, thanks!

My specs:
OS: Windows 10 64bit
CPU: AMD Ryzen 2700x
RAM: 16 GB
GPU: AMD Radeon RX Vega 56
VRAM: 8 GB

Model:
Nous-Hermes-Llama2-13b-GGML (Q4_K_M version)

Anderson452

Jul 26, 2023

same for me, doesn't work at all. gives incoherent answers most of the time

MrDevolver

Jul 26, 2023

same for me, doesn't work at all. gives incoherent answers most of the time

My issue is different. It unloads after a while and doesn't even begin to generate anything. Source of your issue is most likely completely different. Check your settings and system prompt to see if you're using the one suggested.