Authors: NECTEC
Description
This model, nectec/thai-research-gemma-3-27b-it, also known as Pathumma LLM AI, is a fine-tuned version of Google's Gemma 3 27B model, specialized for the Thai language. It was developed by the National Electronics and Computer Technology Center (NECTEC) of Thailand.
Starting with the google/gemma-3-27b-pt
checkpoint, the model underwent continued pre-training on a diverse corpus of approximately 8 billion Thai tokens. Following this, it was instruction fine-tuned on 3,052,736 high-quality Thai question-answer pairs to enhance its ability to follow instructions and engage in conversation.
Inputs and outputs
Input:
- Text string, such as a question, a prompt, or a document to be summarized, primarily in Thai.
- Total input context of 128K tokens
Output:
- Generated text in response to the input, such as an answer to a question, analysis of image content, or a summary of a document, primarily in Thai.
- Total output context of 8192 tokens
Usage
Below there are some code snippets on how to get quickly started with running the model. First, install the necessary libraries.
Running with transformers
on a GPU
$ pip install -U transformers accelerate torch
You can use the model with the transformers
library as follows. The EOSLogitsBiasProcessor
can be helpful if the model has trouble generating the end-of-sequence token.
from transformers import AutoTokenizer, LogitsProcessor, LogitsProcessorList, AutoModelForImageTextToText
import torch
model_id = "nectec/thai-research-gemma-3-27b-it"
model = AutoModelForImageTextToText.from_pretrained(
model_id, device_map="auto", torch_dtype=torch.bfloat16
).eval()
tokenizer = AutoTokenizer.from_pretrained(model_id)
class EOSLogitsBiasProcessor(LogitsProcessor):
def __init__(self,tokenizer, eos_token_id, bias_value=5.0, space_bias = -0.3):
self.eos_token_id = eos_token_id
self.bias_value = bias_value
self.space_bias = space_bias
def __call__(self, input_ids, scores):
scores[:, self.eos_token_id] += self.bias_value
scores[:,107] += 0.3
return scores
logits_processor = LogitsProcessorList([
EOSLogitsBiasProcessor(tokenizer,eos_token_id=tokenizer.eos_token_id, bias_value=10.0)
])
messages = [
{
"role": "system",
"content": [{"type": "text", "text": "You are a helpful Thai assistant. You are Pathumma LLM AI that build by National Electronics and Computer Technology Center (NECTEC)."}]
},
{
"role": "user",
"content": [
{"type": "text", "text": "ขอสูตรส้มตำหน่อย"}
]
},
]
inputs = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, tokenize=True,
return_dict=True, return_tensors="pt"
).to(model.device)
input_len = inputs["input_ids"].shape[-1]
with torch.inference_mode():
generation = model.generate(
**inputs,
max_new_tokens=512,
do_sample=True,
num_beams=2,
repetition_penalty=1.1,
temperature=0.4,
logits_processor=logits_processor, # add if it have a problem not generate eos token
)
generated_text = tokenizer.decode(generation[0][input_len:], skip_special_tokens=True)
print(generated_text)
Running with llama.cpp
You can also run a quantized version of the model using llama-cpp-python
for efficient inference on CPU or GPU.
%pip install llama-cpp-python transformers
import transformers
from llama_cpp import Llama
# Download the GGUF model file
!wget -O './thai-research-gemma-3-27b-it-Q4_K_M.gguf' "https://huggingface.co/nectec/thai-research-gemma-3-27b-it/resolve/main/thai-research-gemma-3-27b-it-Q4_K_M.gguf?download=true"
PROMPT = "You are a helpful Thai assistant. You are Pathumma LLM AI that build by National Electronics and Computer Technology Center (NECTEC)."
llm = Llama(model_path='thai-research-gemma-3-27b-it-Q4_K_M.gguf', n_gpu_layers=-1, n_ctx=4096,verbose=False)
tokenizer = transformers.AutoTokenizer.from_pretrained("nectec/thai-research-gemma-3-27b-it")
memory = [{'content': PROMPT, 'role': 'system'},]
def generate(instuction,memory=memory):
memory.append({'content': instuction, 'role': 'user'})
p = tokenizer.apply_chat_template(
memory,
tokenize=False,
add_generation_prompt=True
)
response = llm(
p,
max_tokens=4096,
temperature=0.4, # ความคิดสร้างสรรค์ถ้าใส่น้อยจะตอบตรงคำถาม ถ้าใส่เยอะสร้างสรรค์แต่อาจจะผิดหรือไม่ตรงประเด็น
repeat_penalty=1.1, # ป้องกันการตอบคำซ้ำ
stop=["<end_of_turn>"]
)
output = response['choices'][0]['text']
memory.append({'content': output, 'role': 'assistant'})
return output
print(generate("ขอสูตรทำส้มตำ",memory=memory))
Citation
If you use this model, please cite the base Gemma model and acknowledge NECTEC's contribution.
@article{gemma_2025,
title={Gemma 3},
url={https://goo.gle/Gemma3Report},
publisher={Kaggle},
author={Gemma Team},
year={2025}
}
- Downloads last month
- 21