Spaces:
Running
MLX Full Fine Tune (not Lora) without freezing weights
I found sources only how to do finetuning creating adapters. My q is, any source of info for Full Fine tuning?
Hi,
@Goblinztech
! The full
fine-tune has been implemented, but you do it through the mlx_lm.lora
script, as indicated here.
Does this means that OPENELM the apples little model is not supported?
You can use use the mlx-lm
package to fine-tune an LLM with low rank
adaptation (LoRA) for a target task.[^lora] The example also supports quantized
LoRA (QLoRA).[^qlora] LoRA fine-tuning works with the following model families:
- Mistral
- Llama
- Phi2
- Mixtral
- Qwen2
- Gemma
- OLMo
- MiniCPM
- InternLM2
I was trying to full-fine tune OPENELM 270m on Air M1 16GB ram and i did get NaNs. The jsonl was book converted from txt. So rather simple and straightforward data.
skriatok on Darkstar.local in ~/Projects/MLX-FINETUNE
$ mlx_lm.lora \
--model mlx-community/OpenELM-270M \
--train \
--data data \
--fine-tune-type full \
--iters 100
Loading pretrained model
Fetching 10 files: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 10/10 [00:00<00:00, 114912.44it/s]
Loading datasets
Training
Trainable parameters: 84.914% (230.566M/271.527M)
Starting training..., iters: 100
Iter 1: Val loss 3.977, Val took 0.170s
Iter 10: Train loss nan, Learning Rate 1.000e-05, It/sec 2.403, Tokens/sec 423.447, Trained Tokens 1762, Peak mem 3.009 GB
Iter 20: Train loss nan, Learning Rate 1.000e-05, It/sec 2.803, Tokens/sec 392.374, Trained Tokens 3162, Peak mem 3.009 GB
Iter 30: Train loss nan, Learning Rate 1.000e-05, It/sec 3.316, Tokens/sec 322.023, Trained Tokens 4133, Peak mem 3.009 GB
Iter 40: Train loss nan, Learning Rate 1.000e-05, It/sec 3.913, Tokens/sec 342.363, Trained Tokens 5008, Peak mem 3.009 GB
Iter 50: Train loss nan, Learning Rate 1.000e-05, It/sec 2.844, Tokens/sec 333.562, Trained Tokens 6181, Peak mem 3.009 GB
Iter 60: Train loss nan, Learning Rate 1.000e-05, It/sec 4.958, Tokens/sec 264.242, Trained Tokens 6714, Peak mem 3.009 GB
Iter 70: Train loss nan, Learning Rate 1.000e-05, It/sec 3.105, Tokens/sec 366.405, Trained Tokens 7894, Peak mem 3.009 GB
Iter 80: Train loss nan, Learning Rate 1.000e-05, It/sec 2.367, Tokens/sec 372.816, Trained Tokens 9469, Peak mem 3.037 GB
Iter 90: Train loss nan, Learning Rate 1.000e-05, It/sec 3.013, Tokens/sec 338.324, Trained Tokens 10592, Peak mem 3.037 GB
Iter 100: Val loss nan, Val took 0.160s
Iter 100: Train loss nan, Learning Rate 1.000e-05, It/sec 2.744, Tokens/sec 341.899, Trained Tokens 11838, Peak mem 3.037 GB
Iter 100: Saved adapter weights to adapters/adapters.safetensors and adapters/0000100_adapters.safetensors.
Saved final weights to adapters/adapters.safetensors.
My RAM was healthy 12,4GB, far away from 16 top. What could be wrong? I have no clue.