Malaysian canopylabs/orpheus-3b-0.1-ft

Finetune canopylabs/orpheus-3b-0.1-ft on standard Malay and minimal Mandarin.

Training session

Finetune on Mesolitica/TTS to make the model able to generate Malay voice with minimal Mandarin.

How we train

  1. LoRA on ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "embed_tokens", "lm_head"].
  2. 128 Rank with alpha 256, or alpha of 2.0, but during merging, we use 1.5 ratio.
  3. Multipacking with proper SDPA causal masking to prevent document contamination and also make sure proper position ids.
  4. Chunk CE loss to reduce memory.

Source code at https://github.com/mesolitica/malaya-speech/tree/master/session/orpheus

Wandb at https://wandb.ai/huseinzol05/malay-orpheus-3b-0.1-ft-lora-128/workspace?nw=nwuserhuseinzol05

Example

Load the model,

from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer
from snac import SNAC
import torch
import IPython.display as ipd

def redistribute_codes(row):
    row_length = row.size(0)
    new_length = (row_length // 7) * 7
    trimmed_row = row[:new_length]
    code_list = [t - 128266 for t in trimmed_row]
    layer_1 = []
    layer_2 = []
    layer_3 = []
    for i in range((len(code_list)+1)//7):
        layer_1.append(code_list[7*i][None])
        layer_2.append(code_list[7*i+1][None]-4096)
        layer_3.append(code_list[7*i+2][None]-(2*4096))
        layer_3.append(code_list[7*i+3][None]-(3*4096))
        layer_2.append(code_list[7*i+4][None]-(4*4096))
        layer_3.append(code_list[7*i+5][None]-(5*4096))
        layer_3.append(code_list[7*i+6][None]-(6*4096))
    
    with torch.no_grad():

        codes = [torch.concat(layer_1), 
            torch.concat(layer_2), 
            torch.concat(layer_3)]
        
        for i in range(len(codes)):
            codes[i][codes[i] < 0] = 0
            codes[i] = codes[i][None]
        
        audio_hat = snac_model.decode(codes)
        return audio_hat.cpu()[0, 0]

snac_model = SNAC.from_pretrained("hubertsiuzdak/snac_24khz")
snac_model = snac_model.to("cuda")

tokenizer = AutoTokenizer.from_pretrained('mesolitica/Malaysian-orpheus-3b-0.1-ft')
model = AutoModelForCausalLM.from_pretrained(
    'mesolitica/Malaysian-orpheus-3b-0.1-ft', torch_dtype = torch.bfloat16
).cuda()

speaker = 'Husein'
text = 'Nama saya Husein, saya tak suka nasi ayam dan tak suka mandi, Xiàn zài wǒ yǒu bing chilling Wǒ hěn xǐ huān bing chilling.'
prompt = f'<custom_token_3><|begin_of_text|>{speaker}: {text}<|eot_id|><custom_token_4><custom_token_5><custom_token_1>'
input_ids = tokenizer(prompt,add_special_tokens = False, return_tensors = 'pt').to('cuda')

with torch.no_grad():
    generated_ids = ori_model.generate(
      **input_ids,
      max_new_tokens=1200,
      do_sample=True,
      temperature=0.9,
      top_p=0.8,
      repetition_penalty=1.1,
      num_return_sequences=1,
      eos_token_id=128258,
    )

row = generated_ids[0, input_ids['input_ids'].shape[1]:]
y_ = redistribute_codes(row)
ipd.Audio(y_, rate = 24000)

Downloads last month
9
Safetensors
Model size
3.78B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for mesolitica/Malaysian-orpheus-3b-0.1-ft

Finetuned
(3)
this model
Quantizations
2 models