lex-au commited on
Commit
d46025e
·
verified ·
1 Parent(s): eb278e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +154 -3
README.md CHANGED
@@ -1,3 +1,154 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ tags:
4
+ - text-to-speech
5
+ - tts
6
+ - audio
7
+ - speech-synthesis
8
+ - orpheus
9
+ - gguf
10
+ license: apache-2.0
11
+ datasets:
12
+ - internal
13
+ ---
14
+
15
+ # Orpheus-3b-FT-Q2_K
16
+
17
+ This is a quantised version of [canopylabs/orpheus-3b-0.1-ft](https://huggingface.co/canopylabs/orpheus-3b-0.1-ft).
18
+
19
+ Orpheus is a high-performance Text-to-Speech model fine-tuned for natural, emotional speech synthesis. This repository hosts the 8-bit quantised version of the 3B parameter model, optimised for efficiency while maintaining high-quality output.
20
+
21
+ ## Model Description
22
+
23
+ **Orpheus-3b-FT-Q2_K** is a 3 billion parameter Text-to-Speech model that converts text inputs into natural-sounding speech with support for multiple voices and emotional expressions. The model has been quantised to 8-bit (Q2_K) format for efficient inference, making it accessible on consumer hardware.
24
+
25
+ Key features:
26
+ - 8 distinct voice options with different characteristics
27
+ - Support for emotion tags like laughter, sighs, etc.
28
+ - Optimised for CUDA acceleration on RTX GPUs
29
+ - Produces high-quality 24kHz mono audio
30
+ - Fine-tuned for conversational naturalness
31
+
32
+ ## How to Use
33
+
34
+ This model is designed to be used with an LLM inference server that connects to the [Orpheus-FastAPI](https://github.com/Lex-au/Orpheus-FastAPI) frontend, which provides both a web UI and OpenAI-compatible API endpoints.
35
+
36
+ ### Compatible Inference Servers
37
+
38
+ This quantised model can be loaded into any of these LLM inference servers:
39
+
40
+ - [GPUStack](https://github.com/gpustack/gpustack) - GPU optimised LLM inference server (My pick) - supports LAN/WAN tensor split parallelisation
41
+ - [LM Studio](https://lmstudio.ai/) - Load the GGUF model and start the local server
42
+ - [llama.cpp server](https://github.com/ggerganov/llama.cpp) - Run with the appropriate model parameters
43
+ - Any compatible OpenAI API-compatible server
44
+
45
+ ### Quick Start
46
+
47
+ 1. Download this quantised model from [lex-au's Orpheus-FASTAPI collection](https://huggingface.co/collections/lex-au/orpheus-fastapi-67e125ae03fc96dae0517707)
48
+
49
+ 2. Load the model in your preferred inference server and start the server.
50
+
51
+ 3. Clone the Orpheus-FastAPI repository:
52
+ ```bash
53
+ git clone https://github.com/Lex-au/Orpheus-FastAPI.git
54
+ cd Orpheus-FastAPI
55
+ ```
56
+
57
+ 4. Configure the FastAPI server to connect to your inference server by setting the `ORPHEUS_API_URL` environment variable.
58
+
59
+ 5. Follow the complete installation and setup instructions in the [repository README](https://github.com/Lex-au/Orpheus-FastAPI).
60
+
61
+ ### Audio Samples
62
+
63
+ Listen to the model in action with different voices and emotions:
64
+
65
+ #### Default Voice Sample
66
+ <audio controls>
67
+ <source src="https://lex-au.github.io/Orpheus-FastAPI/DefaultTest.mp3" type="audio/mpeg">
68
+ Your browser does not support the audio element.
69
+ </audio>
70
+
71
+ #### Leah (Happy)
72
+ <audio controls>
73
+ <source src="https://lex-au.github.io/Orpheus-FastAPI/LeahHappy.mp3" type="audio/mpeg">
74
+ Your browser does not support the audio element.
75
+ </audio>
76
+
77
+ #### Tara (Sad)
78
+ <audio controls>
79
+ <source src="https://lex-au.github.io/Orpheus-FastAPI/TaraSad.mp3" type="audio/mpeg">
80
+ Your browser does not support the audio element.
81
+ </audio>
82
+
83
+ #### Zac (Contemplative)
84
+ <audio controls>
85
+ <source src="https://lex-au.github.io/Orpheus-FastAPI/ZacContemplative.mp3" type="audio/mpeg">
86
+ Your browser does not support the audio element.
87
+ </audio>
88
+
89
+ ### Available Voices
90
+
91
+ The model supports 8 different voices:
92
+ - `tara`: Female, conversational, clear
93
+ - `leah`: Female, warm, gentle
94
+ - `jess`: Female, energetic, youthful
95
+ - `leo`: Male, authoritative, deep
96
+ - `dan`: Male, friendly, casual
97
+ - `mia`: Female, professional, articulate
98
+ - `zac`: Male, enthusiastic, dynamic
99
+ - `zoe`: Female, calm, soothing
100
+
101
+ ### Emotion Tags
102
+
103
+ You can add expressiveness to speech by inserting tags:
104
+ - `<laugh>`, `<chuckle>`: For laughter sounds
105
+ - `<sigh>`: For sighing sounds
106
+ - `<cough>`, `<sniffle>`: For subtle interruptions
107
+ - `<groan>`, `<yawn>`, `<gasp>`: For additional emotional expression
108
+
109
+ ## Technical Specifications
110
+
111
+ - **Architecture**: Specialised token-to-audio sequence model
112
+ - **Parameters**: ~3 billion
113
+ - **Quantisation**: 8-bit (GGUF Q2_K format)
114
+ - **Audio Sample Rate**: 24kHz
115
+ - **Input**: Text with optional voice selection and emotion tags
116
+ - **Output**: High-quality WAV audio
117
+ - **Language**: English
118
+ - **Hardware Requirements**: CUDA-compatible GPU (recommended: RTX series)
119
+ - **Integration Method**: External LLM inference server + Orpheus-FastAPI frontend
120
+
121
+ ## Limitations
122
+
123
+ - Currently supports English text only
124
+ - Best performance achieved on CUDA-compatible GPUs
125
+ - Generation speed depends on GPU capability
126
+
127
+ ## License
128
+
129
+ This model is available under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0).
130
+
131
+ ## Citation & Attribution
132
+
133
+ The original Orpheus model was created by Canopy Labs. This repository contains a quantised version optimised for use with the Orpheus-FastAPI server.
134
+
135
+ If you use this quantised model in your research or applications, please cite:
136
+
137
+ ```
138
+ @misc{orpheus-tts-2025,
139
+ author = {Canopy Labs},
140
+ title = {Orpheus-3b-0.1-ft: Text-to-Speech Model},
141
+ year = {2025},
142
+ publisher = {HuggingFace},
143
+ howpublished = {\url{https://huggingface.co/canopylabs/orpheus-3b-0.1-ft}}
144
+ }
145
+
146
+ @misc{orpheus-quantised-2025,
147
+ author = {Lex-au},
148
+ title = {Orpheus-3b-FT-Q2_K: Quantised TTS Model with FastAPI Server},
149
+ note = {GGUF quantisation of canopylabs/orpheus-3b-0.1-ft},
150
+ year = {2025},
151
+ publisher = {HuggingFace},
152
+ howpublished = {\url{https://huggingface.co/lex-au/Orpheus-3b-FT-Q4_K_M.gguf}}
153
+ }
154
+ ```