18-6-2025-newdata / README.md
AryaGarg23's picture
Upload folder using huggingface_hub
7fea4cc verified
metadata
library_name: peft
license: apache-2.0
base_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503
tags:
  - generated_from_trainer
datasets:
  - combined_dataset.jsonl
model-index:
  - name: combined_model-finetune
    results: []

Built with Axolotl

See axolotl config

axolotl version: 0.10.0.dev0

# ===================================================================
# CONFIG: For a single, combined "Conversion & Debug" Model
# Using the stable 'alpaca' format.
# ===================================================================

# --- Core Model Configuration (Kept as requested) ---
base_model: mistralai/Mistral-Small-3.1-24B-Instruct-2503
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

# --- Performance, Quality, and Memory Optimization ---
flash_attention: true
load_in_4bit: true
load_in_8bit: false
adapter: lora

# --- Dataset Configuration (KEY CHANGE) ---
# Reverted to the stable 'alpaca' type.
# Axolotl will automatically look for 'instruction', 'input', 'output' fields.
datasets:
  - path: combined_dataset.jsonl # This is your new, flattened dataset
    type: alpaca

# --- Output Directory ---
output_dir: ./combined_model-finetune

# --- Training Hyperparameters ---
sequence_len: 2048
micro_batch_size: 1
gradient_accumulation_steps: 4
num_epochs: 3
learning_rate: 3e-5

# --- LoRA Configuration ---
lora_r: 16
lora_alpha: 32
lora_dropout: 0.15
lora_target_modules:
  - q_proj
  - v_proj
  - k_proj
  - o_proj

# --- Logging, Evaluation, and Saving (Kept as requested) ---
logging_steps: 2
evaluation_strategy: "steps"
eval_steps: 2
save_strategy: "steps"
save_steps: 9999
val_set_size: 0.05

special_tokens:
  bos_token: "<s>"
  eos_token: "</s>"
  unk_token: "<unk>"

combined_model-finetune

This model is a fine-tuned version of mistralai/Mistral-Small-3.1-24B-Instruct-2503 on the combined_dataset.jsonl dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1924

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 3e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • training_steps: 21

Training results

Training Loss Epoch Step Validation Loss
No log 0.1481 1 0.2836
0.2664 0.2963 2 0.2538
0.4832 0.5926 4 0.2200
0.3229 0.8889 6 0.2090
0.1517 1.1481 8 0.2022
0.3353 1.4444 10 0.2009
0.2418 1.7407 12 0.1958
0.0811 2.0 14 0.1942
0.0496 2.2963 16 0.1933
0.1906 2.5926 18 0.1927
0.3171 2.8889 20 0.1924

Framework versions

  • PEFT 0.15.2
  • Transformers 4.52.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.6.0
  • Tokenizers 0.21.1