Malaysian canopylabs/orpheus-3b-0.1-ft
Finetune canopylabs/orpheus-3b-0.1-ft on standard Malay and minimal Mandarin.
Training session
Finetune on Mesolitica/TTS to make the model able to generate Malay voice with minimal Mandarin.
How we train
- LoRA on
["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"]
. - 128 Rank with alpha 256, or alpha of 2.0, but during merging, we use 1.5 ratio.
- Multipacking with proper SDPA causal masking to prevent document contamination and also make sure proper position ids.
- Chunk CE loss to reduce memory.
Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/orpheus
Wandb at https://wandb.ai/huseinzol05/malay-orpheus-3b-0.1-ft-lora-128/workspace?nw=nwuserhuseinzol05
Example
Load the model,
from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
from snac import SNAC
import torch
import IPython.display as ipd
def redistribute_codes(row):
row_length = row.size(0)
new_length = (row_length // 7) * 7
trimmed_row = row[:new_length]
code_list = [t - 128266 for t in trimmed_row]
layer_1 = []
layer_2 = []
layer_3 = []
for i in range((len(code_list)+1)//7):
layer_1.append(code_list[7*i][None])
layer_2.append(code_list[7*i+1][None]-4096)
layer_3.append(code_list[7*i+2][None]-(2*4096))
layer_3.append(code_list[7*i+3][None]-(3*4096))
layer_2.append(code_list[7*i+4][None]-(4*4096))
layer_3.append(code_list[7*i+5][None]-(5*4096))
layer_3.append(code_list[7*i+6][None]-(6*4096))
with torch.no_grad():
codes = [torch.concat(layer_1),
torch.concat(layer_2),
torch.concat(layer_3)]
for i in range(len(codes)):
codes[i][codes[i] < 0] = 0
codes[i] = codes[i][None]
audio_hat = snac_model.decode(codes)
return audio_hat.cpu()[0, 0]
snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz")
snac_model = snac_model.to("cuda")
tokenizer = AutoTokenizer.from_pretrained('mesolitica/Malaysian-orpheus-3b-0.1-ft')
model = AutoModelForCausalLM.from_pretrained(
'mesolitica/Malaysian-orpheus-3b-0.1-ft', torch_dtype = torch.bfloat16
).cuda()
speaker = 'Husein'
text = 'Nama saya Husein, saya tak suka nasi ayam dan tak suka mandi, Xiàn zài wǒ yǒu bing chilling Wǒ hěn xǐ huān bing chilling.'
prompt = f'<custom_token_3><|begin_of_text|>{speaker}: {text}<|eot_id|><custom_token_4><custom_token_5><custom_token_1>'
input_ids = tokenizer(prompt,add_special_tokens = False, return_tensors = 'pt').to('cuda')
with torch.no_grad():
generated_ids = ori_model.generate(
**input_ids,
max_new_tokens=1200,
do_sample=True,
temperature=0.9,
top_p=0.8,
repetition_penalty=1.1,
num_return_sequences=1,
eos_token_id=128258,
)
row = generated_ids[0, input_ids['input_ids'].shape[1]:]
y_ = redistribute_codes(row)
ipd.Audio(y_, rate = 24000)
- Downloads last month
- 9
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for mesolitica/Malaysian-orpheus-3b-0.1-ft
Base model
meta-llama/Llama-3.2-3B-Instruct
Finetuned
canopylabs/orpheus-3b-0.1-pretrained
Finetuned
canopylabs/orpheus-3b-0.1-ft