Built with Axolotl

See axolotl config

axolotl version: 0.13.0.dev0

# 1. Base Model & Tokenizer
base_model: mistralai/Mistral-7B-Instruct-v0.3
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
hub_model_id: AiAF/Adapter_Mistral-7B-Instruct-v0.3-co-sft-qlora

# 2. LoRA / QLoRA Configuration
load_in_8bit: false
load_in_4bit: true
adapter: qlora
# Matching gemma2 config for potentially better results
lora_r: 64
lora_alpha: 128
lora_dropout: 0.05
lora_target_linear: true
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

# 3. Dataset Configuration
datasets:
  - path: .
    type: chat_template
    data_files: ./co-sft-dataset.jsonl
    field_messages: conversations
    message_property_mappings:
      role: from
      content: value
    chat_template: jinja
    chat_template_jinja: |
      {{ bos_token }}

      {# 1) Collect all system messages and join them #}
      {% set sys = (
        messages
        | selectattr('role', 'equalto', 'system')
        | map(attribute='content')
        | list
      ) %}
      {% set sys_joined = sys | reject('equalto', None) | map('trim') | select | join('\n\n') %}

      {# 2) Emit conversation without 'system' role; inject system text into the FIRST human turn #}
      {% set injected = namespace(done=false) %}
      {% for m in messages if m['role'] != 'system' %}
        {% if m['role'] == 'human' %}
          {% if not injected.done and sys_joined %}
            {{ '[INST] ' + (sys_joined ~ '\n\n' ~ (m['content'] | trim)) + ' [/INST]' }}
            {% set injected.done = true %}
          {% else %}
            {{ '[INST] ' + (m['content'] | trim) + ' [/INST]' }}
          {% endif %}
        {% elif m['role'] == 'assistant' %}
          {{ ' ' + (m['content'] | trim) + eos_token }}
        {% endif %}
      {% endfor %}

    roles_to_train: ["assistant"]

# 4. Training Parameters
sequence_len: 2048
sample_packing: true
val_set_size: 0.05
num_epochs: 10
dataset_prepared_path: last_run_prepared
gradient_accumulation_steps: 1 #4
micro_batch_size: 32 #2
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
bf16: true
tf32: true
gradient_checkpointing: true

# 5. Saving and Evaluation Strategy
output_dir: ./outputs/sft/Mistral-7B-Instruct-v0.3-co-sft-qlora
logging_steps: 10
evals_per_epoch: 5
saves_per_epoch: 10
save_total_limit: 100

# 6. W&B Logging
wandb_project: "co-sft"
wandb_name: "Mistral-7B-Instruct-v0.3-co-sft-qlora-co-sft"
wandb_log_model: "false"
wandb_run_id: "Mistral-7B-Instruct-v0.3-co-sft-qlora"

# 7. Special Tokens
special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

Adapter_Mistral-7B-Instruct-v0.3-co-sft-qlora

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9397
  • Memory/max Active (gib): 46.02
  • Memory/max Allocated (gib): 46.02
  • Memory/device Reserved (gib): 53.04

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 41
  • training_steps: 1390

Training results

Training Loss Epoch Step Validation Loss Active (gib) Allocated (gib) Reserved (gib)
No log 0 0 4.3069 25.33 25.33 26.92
3.2562 0.2014 28 2.9209 46.02 46.02 53.04
2.7575 0.4029 56 2.6649 46.02 46.02 53.04
2.6222 0.6043 84 2.5501 46.02 46.02 53.04
2.5228 0.8058 112 2.4979 46.02 46.02 53.04
2.4345 1.0072 140 2.4642 25.68 25.68 53.04
2.2688 1.2086 168 2.4165 46.02 46.02 53.04
2.2699 1.4101 196 2.3789 46.02 46.02 53.04
2.2362 1.6115 224 2.3462 46.02 46.02 53.04
2.1887 1.8129 252 2.3109 46.02 46.02 53.04
2.1015 2.0144 280 2.3218 25.68 25.68 53.04
1.8292 2.2158 308 2.3086 46.02 46.02 53.04
1.8575 2.4173 336 2.2847 46.02 46.02 53.04
1.8605 2.6187 364 2.2635 46.02 46.02 53.04
1.8404 2.8201 392 2.2348 46.02 46.02 53.04
1.7624 3.0216 420 2.3304 25.68 25.68 53.04
1.4865 3.2230 448 2.2942 46.02 46.02 53.04
1.5052 3.4245 476 2.2719 46.02 46.02 53.04
1.5553 3.6259 504 2.2517 46.02 46.02 53.04
1.5416 3.8273 532 2.2364 46.02 46.02 53.04
1.4235 4.0288 560 2.3074 25.68 25.68 53.04
1.2006 4.2302 588 2.3393 46.02 46.02 53.04
1.2303 4.4317 616 2.3343 46.02 46.02 53.04
1.2493 4.6331 644 2.3029 46.02 46.02 53.04
1.2642 4.8345 672 2.2887 46.02 46.02 53.04
1.1358 5.0360 700 2.4287 25.68 25.68 53.04
0.9762 5.2374 728 2.4294 46.02 46.02 53.04
0.9885 5.4388 756 2.4367 46.02 46.02 53.04
0.9915 5.6403 784 2.4242 46.02 46.02 53.04
1.0018 5.8417 812 2.4102 46.02 46.02 53.04
0.8874 6.0432 840 2.6173 25.68 25.68 53.04
0.7875 6.2446 868 2.5984 46.02 46.02 53.04
0.7987 6.4460 896 2.5939 46.02 46.02 53.04
0.7918 6.6475 924 2.5724 46.02 46.02 53.04
0.8138 6.8489 952 2.5781 46.02 46.02 53.04
0.7181 7.0504 980 2.8220 25.68 25.68 53.04
0.6593 7.2518 1008 2.7511 46.02 46.02 53.04
0.6757 7.4532 1036 2.7530 46.02 46.02 53.04
0.677 7.6547 1064 2.7439 46.02 46.02 53.04
0.6802 7.8561 1092 2.7484 46.02 46.02 53.04
0.6139 8.0576 1120 2.9129 25.68 25.68 53.04
0.607 8.2590 1148 2.8694 46.02 46.02 53.04
0.6031 8.4604 1176 2.8737 46.02 46.02 53.04
0.6006 8.6619 1204 2.8768 46.02 46.02 53.04
0.6115 8.8633 1232 2.8803 46.02 46.02 53.04
0.569 9.0647 1260 2.9187 25.68 25.68 53.04
0.5722 9.2662 1288 2.9361 46.02 46.02 53.04
0.57 9.4676 1316 2.9407 46.02 46.02 53.04
0.5728 9.6691 1344 2.9397 46.02 46.02 53.04
0.5732 9.8705 1372 2.9397 46.02 46.02 53.04

Framework versions

  • PEFT 0.17.1
  • Transformers 4.56.1
  • Pytorch 2.7.1+cu126
  • Datasets 4.0.0
  • Tokenizers 0.22.1
Downloads last month
376
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AiAF/Adapter_Mistral-7B-Instruct-v0.3-co-sft-qlora

Adapter
(481)
this model