Model Card for Model ID
llama3-8B supervised finetuning with llama-adapter
Model Details
adapter_layers:30 adapter_len:10 gamma:0.85 batch_size_training:4 gradient_accumulation_steps:4 lr:0.0001 num_epochs:3 num_freeze_layers:1 optimizer:"AdamW" peft_method:"llama_adapter" trainable params: 1,228,830 || all params: 8,031,490,078 || trainable%: 0.0153
Model Description
Average epoch time: 967s Train loss: 0.3901134133338928 Eval loss: 1.466189980506897 Eval perplexity: 4.332696437835693
Max CUDA memory allocated was 49 GB Max CUDA memory reserved was 55 GB Peak active CUDA memory was 49 GB CPU Total Peak Memory consumed during the train (max): 4 GB
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support