See axolotl config
axolotl version: 0.13.0.dev0
# 1. Base Model & Tokenizer
base_model: mistralai/Mistral-7B-Instruct-v0.3
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
hub_model_id: AiAF/Adapter_Mistral-7B-Instruct-v0.3-co-sft-qlora
# 2. LoRA / QLoRA Configuration
load_in_8bit: false
load_in_4bit: true
adapter: qlora
# Matching gemma2 config for potentially better results
lora_r: 64
lora_alpha: 128
lora_dropout: 0.05
lora_target_linear: true
lora_target_modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj
# 3. Dataset Configuration
datasets:
- path: .
type: chat_template
data_files: ./co-sft-dataset.jsonl
field_messages: conversations
message_property_mappings:
role: from
content: value
chat_template: jinja
chat_template_jinja: |
{{ bos_token }}
{# 1) Collect all system messages and join them #}
{% set sys = (
messages
| selectattr('role', 'equalto', 'system')
| map(attribute='content')
| list
) %}
{% set sys_joined = sys | reject('equalto', None) | map('trim') | select | join('\n\n') %}
{# 2) Emit conversation without 'system' role; inject system text into the FIRST human turn #}
{% set injected = namespace(done=false) %}
{% for m in messages if m['role'] != 'system' %}
{% if m['role'] == 'human' %}
{% if not injected.done and sys_joined %}
{{ '[INST] ' + (sys_joined ~ '\n\n' ~ (m['content'] | trim)) + ' [/INST]' }}
{% set injected.done = true %}
{% else %}
{{ '[INST] ' + (m['content'] | trim) + ' [/INST]' }}
{% endif %}
{% elif m['role'] == 'assistant' %}
{{ ' ' + (m['content'] | trim) + eos_token }}
{% endif %}
{% endfor %}
roles_to_train: ["assistant"]
# 4. Training Parameters
sequence_len: 2048
sample_packing: true
val_set_size: 0.05
num_epochs: 10
dataset_prepared_path: last_run_prepared
gradient_accumulation_steps: 1 #4
micro_batch_size: 32 #2
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002
bf16: true
tf32: true
gradient_checkpointing: true
# 5. Saving and Evaluation Strategy
output_dir: ./outputs/sft/Mistral-7B-Instruct-v0.3-co-sft-qlora
logging_steps: 10
evals_per_epoch: 5
saves_per_epoch: 10
save_total_limit: 100
# 6. W&B Logging
wandb_project: "co-sft"
wandb_name: "Mistral-7B-Instruct-v0.3-co-sft-qlora-co-sft"
wandb_log_model: "false"
wandb_run_id: "Mistral-7B-Instruct-v0.3-co-sft-qlora"
# 7. Special Tokens
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"
Adapter_Mistral-7B-Instruct-v0.3-co-sft-qlora
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 2.9397
- Memory/max Active (gib): 46.02
- Memory/max Allocated (gib): 46.02
- Memory/device Reserved (gib): 53.04
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 41
- training_steps: 1390
Training results
| Training Loss | Epoch | Step | Validation Loss | Active (gib) | Allocated (gib) | Reserved (gib) |
|---|---|---|---|---|---|---|
| No log | 0 | 0 | 4.3069 | 25.33 | 25.33 | 26.92 |
| 3.2562 | 0.2014 | 28 | 2.9209 | 46.02 | 46.02 | 53.04 |
| 2.7575 | 0.4029 | 56 | 2.6649 | 46.02 | 46.02 | 53.04 |
| 2.6222 | 0.6043 | 84 | 2.5501 | 46.02 | 46.02 | 53.04 |
| 2.5228 | 0.8058 | 112 | 2.4979 | 46.02 | 46.02 | 53.04 |
| 2.4345 | 1.0072 | 140 | 2.4642 | 25.68 | 25.68 | 53.04 |
| 2.2688 | 1.2086 | 168 | 2.4165 | 46.02 | 46.02 | 53.04 |
| 2.2699 | 1.4101 | 196 | 2.3789 | 46.02 | 46.02 | 53.04 |
| 2.2362 | 1.6115 | 224 | 2.3462 | 46.02 | 46.02 | 53.04 |
| 2.1887 | 1.8129 | 252 | 2.3109 | 46.02 | 46.02 | 53.04 |
| 2.1015 | 2.0144 | 280 | 2.3218 | 25.68 | 25.68 | 53.04 |
| 1.8292 | 2.2158 | 308 | 2.3086 | 46.02 | 46.02 | 53.04 |
| 1.8575 | 2.4173 | 336 | 2.2847 | 46.02 | 46.02 | 53.04 |
| 1.8605 | 2.6187 | 364 | 2.2635 | 46.02 | 46.02 | 53.04 |
| 1.8404 | 2.8201 | 392 | 2.2348 | 46.02 | 46.02 | 53.04 |
| 1.7624 | 3.0216 | 420 | 2.3304 | 25.68 | 25.68 | 53.04 |
| 1.4865 | 3.2230 | 448 | 2.2942 | 46.02 | 46.02 | 53.04 |
| 1.5052 | 3.4245 | 476 | 2.2719 | 46.02 | 46.02 | 53.04 |
| 1.5553 | 3.6259 | 504 | 2.2517 | 46.02 | 46.02 | 53.04 |
| 1.5416 | 3.8273 | 532 | 2.2364 | 46.02 | 46.02 | 53.04 |
| 1.4235 | 4.0288 | 560 | 2.3074 | 25.68 | 25.68 | 53.04 |
| 1.2006 | 4.2302 | 588 | 2.3393 | 46.02 | 46.02 | 53.04 |
| 1.2303 | 4.4317 | 616 | 2.3343 | 46.02 | 46.02 | 53.04 |
| 1.2493 | 4.6331 | 644 | 2.3029 | 46.02 | 46.02 | 53.04 |
| 1.2642 | 4.8345 | 672 | 2.2887 | 46.02 | 46.02 | 53.04 |
| 1.1358 | 5.0360 | 700 | 2.4287 | 25.68 | 25.68 | 53.04 |
| 0.9762 | 5.2374 | 728 | 2.4294 | 46.02 | 46.02 | 53.04 |
| 0.9885 | 5.4388 | 756 | 2.4367 | 46.02 | 46.02 | 53.04 |
| 0.9915 | 5.6403 | 784 | 2.4242 | 46.02 | 46.02 | 53.04 |
| 1.0018 | 5.8417 | 812 | 2.4102 | 46.02 | 46.02 | 53.04 |
| 0.8874 | 6.0432 | 840 | 2.6173 | 25.68 | 25.68 | 53.04 |
| 0.7875 | 6.2446 | 868 | 2.5984 | 46.02 | 46.02 | 53.04 |
| 0.7987 | 6.4460 | 896 | 2.5939 | 46.02 | 46.02 | 53.04 |
| 0.7918 | 6.6475 | 924 | 2.5724 | 46.02 | 46.02 | 53.04 |
| 0.8138 | 6.8489 | 952 | 2.5781 | 46.02 | 46.02 | 53.04 |
| 0.7181 | 7.0504 | 980 | 2.8220 | 25.68 | 25.68 | 53.04 |
| 0.6593 | 7.2518 | 1008 | 2.7511 | 46.02 | 46.02 | 53.04 |
| 0.6757 | 7.4532 | 1036 | 2.7530 | 46.02 | 46.02 | 53.04 |
| 0.677 | 7.6547 | 1064 | 2.7439 | 46.02 | 46.02 | 53.04 |
| 0.6802 | 7.8561 | 1092 | 2.7484 | 46.02 | 46.02 | 53.04 |
| 0.6139 | 8.0576 | 1120 | 2.9129 | 25.68 | 25.68 | 53.04 |
| 0.607 | 8.2590 | 1148 | 2.8694 | 46.02 | 46.02 | 53.04 |
| 0.6031 | 8.4604 | 1176 | 2.8737 | 46.02 | 46.02 | 53.04 |
| 0.6006 | 8.6619 | 1204 | 2.8768 | 46.02 | 46.02 | 53.04 |
| 0.6115 | 8.8633 | 1232 | 2.8803 | 46.02 | 46.02 | 53.04 |
| 0.569 | 9.0647 | 1260 | 2.9187 | 25.68 | 25.68 | 53.04 |
| 0.5722 | 9.2662 | 1288 | 2.9361 | 46.02 | 46.02 | 53.04 |
| 0.57 | 9.4676 | 1316 | 2.9407 | 46.02 | 46.02 | 53.04 |
| 0.5728 | 9.6691 | 1344 | 2.9397 | 46.02 | 46.02 | 53.04 |
| 0.5732 | 9.8705 | 1372 | 2.9397 | 46.02 | 46.02 | 53.04 |
Framework versions
- PEFT 0.17.1
- Transformers 4.56.1
- Pytorch 2.7.1+cu126
- Datasets 4.0.0
- Tokenizers 0.22.1
- Downloads last month
- 376
Model tree for AiAF/Adapter_Mistral-7B-Instruct-v0.3-co-sft-qlora
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3