Model Card for Model ID

Model Details

Model Description

TripleBits/Sinhala-Llama-3.2-1B is a LLaMA 3.2 1B-based model that has undergone continual pretraining (CPT) on a diverse Sinhala corpus.

  • Trained by: Team TripleBits for Shared Task 2025

    How to Get Started with the Model

    Use the code below to get started with the model.

    from transformers import AutoTokenizer, AutoModelForCausalLM
    from peft import PeftModel
    from huggingface_hub import login
    
    login(token="your_hf_token")
    
    # Load base model
    base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B", device_map="auto")
    tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
    
    # Load trained LoRA adapter
    model = PeftModel.from_pretrained(base_model, "iCIIT/TripleBits-Sinhala-Llama-3.2-1B-CP")
    
    
    question = "ශ්‍රී ලංකාවේ අගනුවර කුමක්ද?"
    
    instruction = f"""පහත සදහන් ප්‍රශ්නයට නිවැරදි පිළිතුරක් ලබා දෙන්න. පිළිතුරු ලබා දීමේදී ප්‍රශ්නයේ ස්වභාවය අනුව - සරල ප්‍රශ්න සඳහා කෙටි පිළිතුරු ද, සංකීර්ණ ප්‍රශ්න සඳහා විස්තරාත්මක පැහැදිලි කිරීම් ද ලබා දෙන්න.
    ### ප්‍රශ්නය: {question}
    ### පිළිතුර:"""
    
    inputs = tokenizer(instruction, return_tensors="pt").to(model.device)
    outputs = model.generate(
                **inputs,
                max_new_tokens=100,
                num_beams=10,
                repetition_penalty=1.2,
                no_repeat_ngram_size=3,
                do_sample=False,
                early_stopping=True,
                eos_token_id=tokenizer.eos_token_id, 
                pad_token_id=tokenizer.pad_token_id
                )
    
    generated_answer_tokens = outputs[0][inputs.input_ids.shape[1]:]
    generated_answer = tokenizer.decode(generated_answer_tokens, skip_special_tokens=True).strip()
    print(generated_answer)
    

    Training Details

    • A detailed report is provided here.
    • GitHub Repository can be found here.

    Training Data

    wikimedia/wikipedia

    Training Hyperparameters

    • micro_batch_size = 8
    • batch_size = 64
    • gradient_accumulation_steps = batch_size // micro_batch_size
    • epochs = 5
    • learning _rate = 3e-4
    • max_seq_len = 512
    • lora_r = 4
    • lora_alpha = 8
    • lora_dropout = 0.1

    Framework versions

    • PEFT 0.17.0
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for iCIIT/TripleBits-Sinhala-Llama-3.2-1B-CP

Adapter
(548)
this model

Collection including iCIIT/TripleBits-Sinhala-Llama-3.2-1B-CP