TinyLlama-1.1B-Hinglish-LORA-v1.0 model

  • Developed by: Kiran Kunapuli
  • License: apache-2.0
  • Finetuned from model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
  • Model config:
      model = FastLanguageModel.get_peft_model(
      model,
      r = 64, 
      target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                        "gate_proj", "up_proj", "down_proj",],
      lora_alpha = 128,
      lora_dropout = 0, 
      bias = "none",   
      use_gradient_checkpointing = True, 
      random_state = 42,
      use_rslora = True,  
      loftq_config = None, 
      )
    
  • Training parameters:
      trainer = SFTTrainer(
      model = model,
      tokenizer = tokenizer,
      train_dataset = dataset,
      dataset_text_field = "text",
      max_seq_length = max_seq_length,
      dataset_num_proc = 2,
      packing = True, 
      args = TrainingArguments(
          per_device_train_batch_size = 12,
          gradient_accumulation_steps = 16,
          warmup_ratio = 0.1,
          num_train_epochs = 1,
          learning_rate = 2e-4,
          fp16 = not torch.cuda.is_bf16_supported(),
          bf16 = torch.cuda.is_bf16_supported(),
          logging_steps = 1,
          optim = "paged_adamw_32bit",
          weight_decay = 0.001,
          lr_scheduler_type = "cosine",
          seed = 42,
          output_dir = "outputs",
          report_to = "wandb",
        ),
      )
    
  • Training details:
    ==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
       \\   /|    Num examples = 15,464 | Num Epochs = 1
    O^O/ \_/ \    Batch size per device = 12 | Gradient Accumulation steps = 16
    \        /    Total batch size = 192 | Total steps = 80
     "-____-"     Number of trainable parameters = 50,462,720
    
    GPU = NVIDIA GeForce RTX 3090. Max memory = 24.0 GB.
    Total time taken for 1 epoch - 2h:35m:28s
    9443.5288 seconds used for training.
    157.39 minutes used for training.
    Peak reserved memory = 17.641 GB.
    Peak reserved memory for training = 15.344 GB.
    Peak reserved memory % of max memory = 73.504 %.
    Peak reserved memory for training % of max memory = 63.933 %.
    

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

[NOTE] TinyLlama's internal maximum sequence length is 2048. We use RoPE Scaling to extend it to 4096 with Unsloth!

Downloads last month
23
GGUF
Model size
1.1B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kirankunapuli/TinyLlama-1.1B-Hinglish-LORA-v1.0

Quantized
(100)
this model

Datasets used to train kirankunapuli/TinyLlama-1.1B-Hinglish-LORA-v1.0