|
|
--- |
|
|
base_model: unsloth/mistral-7b-v0.3-bnb-4bit |
|
|
tags: |
|
|
- text-generation-inference |
|
|
- transformers |
|
|
- unsloth |
|
|
- mistral |
|
|
- trl |
|
|
- sft |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- fr |
|
|
datasets: |
|
|
- KasparZ/mtext-071024 |
|
|
--- |
|
|
|
|
|
# Uploaded model |
|
|
|
|
|
- **Developed by:** KasparZ |
|
|
- **License:** apache-2.0 |
|
|
- **Finetuned from model :** unsloth/mistral-7b-v0.3-bnb-4bit |
|
|
- - max_seq_length = 4096 |
|
|
- tokenizer.pad_token = tokenizer.eos_token |
|
|
- model.config.pad_token_id = tokenizer.pad_token_id |
|
|
- new_tokens = ["<|s|>", "<|e|>"] |
|
|
|
|
|
- **LoRA** |
|
|
- r = 128, |
|
|
- target_modules = ["q_proj", "k_proj", "v_proj", "o_proj","gate_proj", "up_proj", "down_proj","embed_tokens", "lm_head"] |
|
|
- lora_alpha = 32, |
|
|
- lora_dropout = 0, |
|
|
- bias = "none", |
|
|
- use_gradient_checkpointing = "unsloth", |
|
|
- random_state = 3407, |
|
|
- use_rslora = True, |
|
|
- loftq_config = None, |
|
|
|
|
|
- **Training** |
|
|
- per_device_train_batch_size = 1, |
|
|
- gradient_accumulation_steps = 8, |
|
|
- warmup_ratio = 0.1, |
|
|
- num_train_epochs = 2, |
|
|
- learning_rate = 1e-4, |
|
|
- embedding_learning_rate = 5e-5, |
|
|
- fp16 = True, |
|
|
- bf16 = False, |
|
|
- logging_steps = 1, |
|
|
- optim = "adamw_8bit", |
|
|
- weight_decay = 0.01, |
|
|
- lr_scheduler_type = "cosine", |
|
|
- seed = 3407, |
|
|
- output_dir = "outputs", |
|
|
- save_strategy = "steps", |
|
|
- save_steps = 50, |
|
|
- report_to = "none", |
|
|
|
|
|
This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library. |
|
|
|
|
|
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth) |