try Yi-6B by llama.cpp
now, you can try this at :
https://huggingface.co/spaces/ztime/Yi-6B-GGUF_llama_cpp_python
Nice try! GGUF seems add a BOS token in front of prompt by default (which is not used by Yi base models), is this app dealing with that?
Nice try! GGUF seems add a BOS token in front of prompt by default (which is not used by Yi base models), is this app dealing with that?
Indeed, I updated a version now and removed the BOS token
@ztime
Can you share the Yi-6B-GUFF file with us?
My T4 can't successfully run the Yi-34B model, even with 2bits quant.
@ztime Can you share the Yi-6B-GUFF file with us?
My T4 can't successfully run the Yi-34B model, even with 2bits quant.
you can find 6b gguf file at here : https://huggingface.co/SamPurkis/Yi-6B-GGUF
mac m1 chip 32GB ram can infre Yi-34B gguf by llama.cpp