Run Locally With LM Studio
Hey I made a repo to make it easy to run locally with LM Studio using less than 3GB.
Can you provide the colab link for your code?
Excellent work @isaiahbjork , works directly with llama.cpp as well:
llama-server -m S:\orpheus-3b-0.1-ft-q4_k_m.gguf -c 8192 -ngl 29 --host 0.0.0.0 --port 1234 --cache-type-k q8_0 --cache-type-v q8_0 -fa --mlock
Thanks!! This looks great - I'll include it in the readme of the main repo!
Excellent work @isaiahbjork , works directly with llama.cpp as well:
llama-server -m S:\orpheus-3b-0.1-ft-q4_k_m.gguf -c 8192 -ngl 29 --host 0.0.0.0 --port 1234 --cache-type-k q8_0 --cache-type-v q8_0 -fa --mlock
This runs for me but I don't understand how to get speech output. Thanks.
Excellent work @isaiahbjork , works directly with llama.cpp as well:
llama-server -m S:\orpheus-3b-0.1-ft-q4_k_m.gguf -c 8192 -ngl 29 --host 0.0.0.0 --port 1234 --cache-type-k q8_0 --cache-type-v q8_0 -fa --mlock
This runs for me but I don't understand how to get speech output. Thanks.
Run the gguf_orpheus.py
python script in the repo and it will generate a .wav
file output.
Thanks. What do the cache-type-k and cache-type-v flags do?
Thanks. What do the cache-type-k and cache-type-v flags do?
Here is a helpful link that explains this for Ollama, which is a wrapper around llama.cpp: https://smcleod.net/2024/12/bringing-k/v-context-quantisation-to-ollama/
How do I run the gguf_orpheus.py python script in the repo to generate a .wav file output? What are the instructions please?
Hello. I need some help, if it is possibe, configure Orpheus-TTS-FastAPI installed in Pinokio, to use Ollama instead of LM-Studio. My processor does not support Intel® AVX2 and therefore LM-studio fails to load any model. Put I have Ollama install and I can load the legraphista/Orpheus model. I tried messing with the .env and .env.temp to point to Ollama but it did not work. Any thoughts? Thanks