Uploaded model

  • Developed by: KasparZ

  • License: apache-2.0

  • Finetuned from model : unsloth/mistral-7b-v0.3-bnb-4bit

    • max_seq_length = 4096
  • tokenizer.pad_token = tokenizer.eos_token

  • model.config.pad_token_id = tokenizer.pad_token_id

  • new_tokens = ["<|s|>", "<|e|>"]

  • LoRA

  • r = 128,

  • target_modules = ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj","embed_tokens", "lm_head"]

  • lora_alpha = 32,

  • lora_dropout = 0,

  • bias = "none",

  • use_gradient_checkpointing = "unsloth",

  • random_state = 3407,

  • use_rslora = True,

  • loftq_config = None,

  • Training

  • per_device_train_batch_size = 1,

  • gradient_accumulation_steps = 8,

  • warmup_ratio = 0.1,

  • num_train_epochs = 2,

  • learning_rate = 1e-4,

  • embedding_learning_rate = 5e-5,

  • fp16 = True,

  • bf16 = False,

  • logging_steps = 1,

  • optim = "adamw_8bit",

  • weight_decay = 0.01,

  • lr_scheduler_type = "cosine",

  • seed = 3407,

  • output_dir = "outputs",

  • save_strategy = "steps",

  • save_steps = 50,

  • report_to = "none",

This mistral model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month
3
Safetensors
Model size
7B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KasparZ/mtext-050625_mistral-7B-v0.3_merged

Quantized
(189)
this model

Dataset used to train KasparZ/mtext-050625_mistral-7B-v0.3_merged