Authors: NECTEC

Description

This model, nectec/thai-research-gemma-3-27b-it, also known as Pathumma LLM AI, is a fine-tuned version of Google's Gemma 3 27B model, specialized for the Thai language. It was developed by the National Electronics and Computer Technology Center (NECTEC) of Thailand.

Starting with the google/gemma-3-27b-pt checkpoint, the model underwent continued pre-training on a diverse corpus of approximately 8 billion Thai tokens. Following this, it was instruction fine-tuned on 3,052,736 high-quality Thai question-answer pairs to enhance its ability to follow instructions and engage in conversation.

Inputs and outputs

  • Input:

    • Text string, such as a question, a prompt, or a document to be summarized, primarily in Thai.
    • Total input context of 128K tokens
  • Output:

    • Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document, primarily in Thai.
    • Total output context of 8192 tokens

Usage

Below there are some code snippets on how to get quickly started with running the model. First, install the necessary libraries.

Running with transformers on a GPU

$ pip install -U transformers accelerate torch

You can use the model with the transformers library as follows. The EOSLogitsBiasProcessor can be helpful if the model has trouble generating the end-of-sequence token.

from transformers import AutoTokenizer, LogitsProcessor, LogitsProcessorList, AutoModelForImageTextToText
import torch

model_id = "nectec/thai-research-gemma-3-27b-it"

model = AutoModelForImageTextToText.from_pretrained(
    model_id, device_map="auto", torch_dtype=torch.bfloat16
).eval()

tokenizer = AutoTokenizer.from_pretrained(model_id)

class EOSLogitsBiasProcessor(LogitsProcessor):
    def __init__(self,tokenizer, eos_token_id, bias_value=5.0, space_bias = -0.3):
        self.eos_token_id = eos_token_id
        self.bias_value = bias_value
        self.space_bias = space_bias
    def __call__(self, input_ids, scores):
        scores[:, self.eos_token_id] += self.bias_value
        scores[:,107] += 0.3
        return scores

logits_processor = LogitsProcessorList([
    EOSLogitsBiasProcessor(tokenizer,eos_token_id=tokenizer.eos_token_id, bias_value=10.0)
])

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful Thai assistant. You are Pathumma LLM AI that build by National Electronics and Computer Technology Center (NECTEC)."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "text", "text": "ขอสูตรส้มตำหน่อย"}
        ]
    },
]

inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, tokenize=True,
    return_dict=True, return_tensors="pt"
).to(model.device)

input_len = inputs["input_ids"].shape[-1]

with torch.inference_mode():
    generation = model.generate(
        **inputs,
        max_new_tokens=512,
        do_sample=True,
        num_beams=2,
        repetition_penalty=1.1,
        temperature=0.4,
        logits_processor=logits_processor,  # add if it have a problem not generate eos token
    )

generated_text = tokenizer.decode(generation[0][input_len:], skip_special_tokens=True)
print(generated_text)

Running with llama.cpp

You can also run a quantized version of the model using llama-cpp-python for efficient inference on CPU or GPU.

%pip install llama-cpp-python transformers
import transformers
from llama_cpp import Llama

# Download the GGUF model file
!wget -O './thai-research-gemma-3-27b-it-Q4_K_M.gguf' "https://huggingface.co/nectec/thai-research-gemma-3-27b-it/resolve/main/thai-research-gemma-3-27b-it-Q4_K_M.gguf?download=true"

PROMPT = "You are a helpful Thai assistant. You are Pathumma LLM AI that build by National Electronics and Computer Technology Center (NECTEC)."

llm = Llama(model_path='thai-research-gemma-3-27b-it-Q4_K_M.gguf', n_gpu_layers=-1, n_ctx=4096,verbose=False)
tokenizer = transformers.AutoTokenizer.from_pretrained("nectec/thai-research-gemma-3-27b-it")

memory = [{'content': PROMPT, 'role': 'system'},]
def generate(instuction,memory=memory):
    memory.append({'content': instuction, 'role': 'user'})
    p = tokenizer.apply_chat_template(
        memory,
        tokenize=False,
        add_generation_prompt=True
    )
    response = llm(
        p,
        max_tokens=4096,
        temperature=0.4, # ความคิดสร้างสรรค์ถ้าใส่น้อยจะตอบตรงคำถาม ถ้าใส่เยอะสร้างสรรค์แต่อาจจะผิดหรือไม่ตรงประเด็น
        repeat_penalty=1.1, # ป้องกันการตอบคำซ้ำ
        stop=["<end_of_turn>"]
    )
    output = response['choices'][0]['text']
    memory.append({'content': output, 'role': 'assistant'})
    return output

print(generate("ขอสูตรทำส้มตำ",memory=memory))

Citation

If you use this model, please cite the base Gemma model and acknowledge NECTEC's contribution.

@article{gemma_2025,
    title={Gemma 3},
    url={https://goo.gle/Gemma3Report},
    publisher={Kaggle},
    author={Gemma Team},
    year={2025}
}
Downloads last month
21
Safetensors
Model size
28.8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nectec/thai-research-gemma-3-27b-it

Quantized
(20)
this model
Quantizations
2 models