Table of Contents

  1. Model Summary
  2. Use
  3. Training

Model Summary

We present KARINA, finetuned from BLOOMZ bigscience/bloomz-3b, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOMZ pretrained multilingual language models on our crosslingual task mixture (xP3) and find the resulting models capable of crosslingual generalization to unseen tasks & languages.

Use

Intended use

We recommend using the model to perform tasks expressed in natural language. For example, given the prompt "prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n"", the model will most likely answer "Saya Karina. Ada yang bisa saya bantu?".

How to use

CPU

Click to expand
# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL_NAME = "yodi/karina"

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)

inputs = tokenizer.encode("Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n", return_tensors="pt")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))

GPU in 4 bit

Click to expand
# pip install -q transformers 
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import pipeline

MODEL_NAME = "yodi/karina"

model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n"

generator = pipeline('text-generation',
                     model=model_4bit,
                     tokenizer=tokenizer,
                     do_sample=False)

result = generator(prompt, max_length=256)
print(result)

GPU in 8bit

Click to expand
# pip install -q transformers 
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import pipeline

MODEL_NAME = "yodi/karina"

model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n"

generator = pipeline('text-generation',
                     model=model_4bit,
                     tokenizer=tokenizer,
                     do_sample=False)

result = generator(prompt, max_length=256)
print(result)
[{'generated_text': 'Given the question:\n{ siapa kamu? }\n---\nAnswer:\nSaya Karina, asisten virtual siap membantu seputar estimasi harga atau pertanyaan lain'}]

Infer in Local with Gradio

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import pipeline
import re

import gradio as gr

MODEL_NAME = "yodi/karina"

model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

generator = pipeline('text-generation',
                     model=model_4bit,
                     tokenizer=tokenizer,
                     do_sample=False)

def preprocess(text):
    return f"Given the question:\n{{ {text} }}\n---\nAnswer:\n"

def generate(text):
    preprocess_result = preprocess(text)
    result = generator(preprocess_result, max_length=256)
    output = re.split(r'\n---\nAnswer:\n',result[0]['generated_text'])[1]

    return output

with gr.Blocks() as demo:
    input_text = gr.Textbox(label="Input", lines=1)
    button = gr.Button("Submit")
    output_text = gr.Textbox(lines=6, label="Output")
    button.click(generate, inputs=[input_text], outputs=output_text)

demo.launch(enable_queue=True, debug=True)

And open the gradio url from browser.

Training procedure

The following bitsandbytes quantization config was used during training:

  • load_in_8bit: False
  • load_in_4bit: True
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: nf4
  • bnb_4bit_use_double_quant: True
  • bnb_4bit_compute_dtype: float16

Framework versions

  • PEFT 0.5.0.dev0

Limitations

Prompt Engineering: The performance may vary depending on the prompt and its following BLOOMZ models.

Training

Model

  • Architecture: Same as bloom, also refer to the config.json file
Downloads last month
22
Safetensors
Model size
3B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support