--- base_model: unsloth/mistral-7b-bnb-4bit tags: - text-generation-inference - transformers - unsloth - mistral - trl license: apache-2.0 language: - en datasets: - KasparZ/mtext-071024 --- # Uploaded model - **Developed by:** KasparZ - **License:** apache-2.0 - **Finetuned from model :** unsloth/mistral-7b-bnb-4bit - max_seq_length = 4096 - new_tokens = ["<|s|>", "<|e|>"] - **LoRA:** - r = 128, - target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"], - lora_alpha = 32, - lora_dropout = 0, # Supports any, but = 0 is optimized - bias = "none", # Supports any, but = "none" is optimized - use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context - random_state = 3407, - use_rslora = False, # We support rank stabilized LoRA - loftq_config = None, # And LoftQ - **Training:** - per_device_train_batch_size = 1, - gradient_accumulation_steps = 8, - warmup_ratio = 0.1, - num_train_epochs = 1, # on Google Colab: x2 sequentially - learning_rate = 5e-5, - embedding_learning_rate = 5e-6, - fp16 = not is_bfloat16_supported(), - bf16 = is_bfloat16_supported(), - logging_steps = 1, - optim = "adamw_8bit", - weight_decay = 0.00, - lr_scheduler_type = "cosine", - seed = 3407, - output_dir = "outputs", - report_to = "none" - **dataset include EOS at the end of each chunk. Might results strange behaviour** - This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. [](https://github.com/unslothai/unsloth)