| I'll explain more about this model when I've found the optimal checkpoint for its use case | |
| it's been full fine-tuned on [Sandevistan](https://huggingface.co/datasets/Replete-AI/Sandevistan). | |
| Here is my Axolotl config (thanks to fizz and empti): | |
| ``` | |
| base_model: meta-llama/Meta-Llama-3-8B | |
| load_in_8bit: false | |
| load_in_4bit: false | |
| strict: false | |
| load_in_8bit: false | |
| load_in_4bit: false | |
| strict: false | |
| datasets: | |
| - path: Kquant03/Sandevistan_Reformat | |
| type: customllama3_stan | |
| dataset_prepared_path: last_run_prepared | |
| val_set_size: 0.05 | |
| output_dir: ./outputs/out | |
| max_steps: 80000 | |
| fix_untrained_tokens: true | |
| sequence_len: 4096 | |
| sample_packing: true | |
| pad_to_sequence_len: true | |
| wandb_project: Pneuma | |
| wandb_entity: | |
| wandb_watch: | |
| wandb_name: | |
| wandb_log_model: | |
| gradient_accumulation_steps: 16 | |
| micro_batch_size: 8 | |
| num_epochs: 1 | |
| optimizer: paged_adamw_8bit | |
| lr_scheduler: cosine | |
| learning_rate: 0.00001 | |
| max_grad_norm: 1 | |
| train_on_inputs: false | |
| group_by_length: false | |
| bf16: auto | |
| fp16: | |
| tf32: false | |
| gradient_checkpointing: unsloth | |
| early_stopping_patience: | |
| resume_from_checkpoint: | |
| logging_steps: 1 | |
| xformers_attention: | |
| flash_attention: true | |
| eval_sample_packing: false | |
| plugins: | |
| - axolotl.integrations.liger.LigerPlugin | |
| liger_rope: true | |
| liger_rms_norm: true | |
| liger_swiglu: true | |
| liger_fused_linear_cross_entropy: true | |
| hub_model_id: Replete-AI/L3-Pneuma-8B | |
| hub_strategy: every_save | |
| warmup_steps: 10 | |
| evals_per_epoch: 3 | |
| eval_table_size: | |
| saves_per_epoch: 3 | |
| debug: | |
| deepspeed: | |
| weight_decay: 0.1 | |
| fsdp: | |
| fsdp_config: | |
| special_tokens: | |
| bos_token: "<|begin_of_text|>" | |
| eos_token: "<|end_of_text|>" | |
| pad_token: "<|end_of_text|>" | |
| tokens: | |
| ``` | |
| This is the WandB loss for this section of the fine-tune: | |
|  | |