See axolotl config
axolotl version: 0.8.0.dev0
base_model: mistralai/Mistral-7B-Instruct-v0.3
# optionally might have model_type or tokenizer_type
model_type: MistralForCausalLM
tokenizer_type: LlamaTokenizer
# Automatically upload checkpoint and final model to HF
hub_model_id: AiAF/Pretrained-QLoRA-r9kilo-V1
load_in_8bit: false
load_in_4bit: true
strict: false
datasets:
- path: json
data_files: ["pretraining.jsonl"]
type: completion
dataset_prepared_path: last_run_prepared
val_set_size: 0.05
output_dir: ./outputs/qlora-out
save_total_limit: 1000
adapter: qlora
lora_model_dir:
lora_r: 256
lora_alpha: 512
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
- gate_proj
- down_proj
- up_proj
- q_proj
- v_proj
- k_proj
- o_proj
sequence_len: 512
sample_packing: true
pad_to_sequence_len: true
eval_sample_packing: false
wandb_project: "LLM-Pretraining"
wandb_watch: "all"
wandb_name: "QLoRA-9000-LLM_Datasets-V1"
wandb_log_model: "false"
wandb_run_id: "QLoRA-9000-LLM_Datasets-V1"
gradient_accumulation_steps: 4
micro_batch_size: 64
num_epochs: 4
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.00000047 #0.0000033 #0.000005
train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 15
evals_per_epoch: 10
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 5
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
Pretrained-QLoRA-r9kilo-V1
This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.3 on the json dataset. It achieves the following results on the evaluation set:
- Loss: 2.0777
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4.7e-07
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 256
- optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 15
- num_epochs: 4.0
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.4054 | 0.0126 | 1 | 2.3746 |
2.3966 | 0.1006 | 8 | 2.3649 |
2.3607 | 0.2013 | 16 | 2.3105 |
2.2822 | 0.3019 | 24 | 2.2564 |
2.2658 | 0.4025 | 32 | 2.2115 |
2.2261 | 0.5031 | 40 | 2.1804 |
2.1511 | 0.6038 | 48 | 2.1578 |
2.1752 | 0.7044 | 56 | 2.1431 |
2.1718 | 0.8050 | 64 | 2.1331 |
2.1669 | 0.9057 | 72 | 2.1237 |
2.1408 | 1.0 | 80 | 2.1135 |
2.1057 | 1.1006 | 88 | 2.1085 |
2.1289 | 1.2013 | 96 | 2.1038 |
2.0875 | 1.3019 | 104 | 2.0994 |
2.1468 | 1.4025 | 112 | 2.0960 |
2.1295 | 1.5031 | 120 | 2.0933 |
2.1162 | 1.6038 | 128 | 2.0910 |
2.1073 | 1.7044 | 136 | 2.0891 |
2.1002 | 1.8050 | 144 | 2.0875 |
2.1017 | 1.9057 | 152 | 2.0860 |
2.0871 | 2.0 | 160 | 2.0849 |
2.0889 | 2.1006 | 168 | 2.0838 |
2.1011 | 2.2013 | 176 | 2.0828 |
2.1061 | 2.3019 | 184 | 2.0820 |
2.1024 | 2.4025 | 192 | 2.0812 |
2.1313 | 2.5031 | 200 | 2.0807 |
2.1128 | 2.6038 | 208 | 2.0801 |
2.0528 | 2.7044 | 216 | 2.0796 |
2.1116 | 2.8050 | 224 | 2.0792 |
2.1395 | 2.9057 | 232 | 2.0789 |
2.1217 | 3.0 | 240 | 2.0786 |
2.1046 | 3.1006 | 248 | 2.0784 |
2.1093 | 3.2013 | 256 | 2.0781 |
2.1218 | 3.3019 | 264 | 2.0780 |
2.1058 | 3.4025 | 272 | 2.0779 |
2.1027 | 3.5031 | 280 | 2.0778 |
2.0832 | 3.6038 | 288 | 2.0778 |
2.1026 | 3.7044 | 296 | 2.0777 |
2.1173 | 3.8050 | 304 | 2.0777 |
2.1293 | 3.9057 | 312 | 2.0777 |
Framework versions
- PEFT 0.14.0
- Transformers 4.49.0
- Pytorch 2.5.1+cu124
- Datasets 3.2.0
- Tokenizers 0.21.0
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for AiAF/Pretrained-QLoRA-r9kilo-V1
Base model
mistralai/Mistral-7B-v0.3
Finetuned
mistralai/Mistral-7B-Instruct-v0.3