A newer version of this model is available: google/gemma-3-4b-it

🧠 Lao Summarization Model ສະຫລຸບເນື້ອຫາສຳລັບພາສາລາວ - Fine-tuned Gemma 3 4B IT (10,000 Pairs, Laos Input-Output)

This is a Lao language summarization model fine-tuned on the Phonepadith/laos_word_dataset, using the base model google/gemma-3-4b-it. The model is designed to generate concise summaries from Lao language text.


🧠 Lao AIDC-10K Fine-tuned Gemma-3-4B-IT

Model ID: Phonepadith/aidc-llm-laos-10k-gemma-3-4b-it
Base Model: google/gemma-3b-it
Fine-tuned By: Phonepadith Phoummavong


📌 Model Description

This model is a fine-tuned version of Gemma-3-4B-IT, specifically adapted to understand and generate responses in Lao language 🇱🇦. It was trained using a curated dataset of over 5,000 high-quality Lao input-output pairs, primarily focused on AIDC (Artificial Intelligence and Digital Content) topics.

Key Features:

  • 🗣️ Fine-tuned for Lao language generation
  • 📚 Suitable for summarization, question answering, general chat
  • 🧠 Based on Google's powerful Gemma 3-4B Instruct model

🧾 Training Details

Detail Value
Base Model Gemma 3-4B Instruct
Fine-tuning Method LoRA with PEFT (Unsloth)
Dataset 10,000 Laos supervised samples
Sequence Length 2048
Batch Size 2 (with gradient accumulation)
Optimizer AdamW
Epochs 3–5 (early stopping enabled)
Format GGUF (F32, F16, Q8_0 available)

📥 How to Use (LM Studio)

  1. Install LM Studio: https://lmstudio.ai
  2. Import the Model:
    • Via Hugging Face: Search for Phonepadith/aidc-llm-laos-10k-gemma-3-4b-it
    • Or drag the .gguf file into LM Studio
  3. Set System Prompt:

📌 Model Details


📊 Metrics

  • Evaluation Metric: BLEU score
    BLEU is used to evaluate the quality of generated summaries against reference summaries in the dataset.

🛠️ How to Use

You can load and use the model with Hugging Face Transformers and adapter-transformers:


from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "Phonepadith/aidc-llm-laos-10k-gemma-3-4b-it"  # change to your actual model name
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

input_text = "ປັດຈຸບັນ ກອງທັບປະຊາຊົນລາວ ມີການປະກອບວັດຖຸເຕັກນິກທັນສະໄໝສົມຄວນ, ສາມາດຕອບສະໜອງ ໃຫ້ແກ່ວຽກງານປ້ອງກັນຊາດ ໃນໄລຍະໃໝ່ ໄດ້ໂດຍພື້ນຖານ; ໄດ້ປະກອບສ່ວນຢ່າງຕັ້ງໜ້າເຂົ້າໃນການປ້ອງກັນ, ຄວບຄຸມໄພພິບັດ ແລະ ຊ່ວຍເຫລືອປະຊາຊົນ ຜູ້ປະສົບໄພພິບັດທຳມະຊາດຕ່າງໆທີ່ເກີດຂຶ້ນໃນຂອບເຂດທົ່ວປະເທດ. ພ້ອມນັ້ນ, ກໍໄດ້ເປັນເຈົ້າການປະກອບສ່ວນປັບປຸງກໍ່ສ້າງພື້ນ ຖານການເມືອງ, ກໍ່ສ້າງທ່າສະໜາມສົງຄາມປະຊາຊົນ 3 ຂັ້ນ ຕິດພັນກັບວຽກງານ 3 ສ້າງ ຢູ່ທ້ອງຖິ່ນຕາມ 4 ເນື້ອໃນ 4 ຄາດໝາຍ ແລະ ສືບທອດມູນເຊື້ອຄວາມສາມັກຄີ ກັບກອງທັບປະເທດເພື່ອນມິດ ສາກົນ, ປະຕິບັດນະໂຍບາຍເພີ່ມມິດຫລຸດຜ່ອນສັດຕູ, ຮັບປະກັນສະຖຽນລະພາບ ຂອງລະບອບການ ເມືອງ, ຮັກສາຄວາມສະຫງົບປອດໄພຕາມຊາຍແດນ"
inputs = tokenizer(input_text, return_tensors="pt")
summary_ids = model.generate(**inputs, max_new_tokens=100)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

print(summary)
Downloads last month
504
GGUF
Model size
5B params
Architecture
gemma3
Hardware compatibility
Log In to view the estimation

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Phonepadith/aidc-llm-laos-10k-gemma-3-4b-it

Quantized
(141)
this model

Dataset used to train Phonepadith/aidc-llm-laos-10k-gemma-3-4b-it