--- library_name: transformers license: apache-2.0 base_model: Qwen/Qwen2.5-14B model-index: - name: LLaMutation-Qwen2.5-14B-SFFT-v0.0 results: [] --- # LLaMutation-Qwen2.5-14B-SFFT-v0.0 ![image/webp](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/IFK02cTih72zfZfT5UY4f.webp) This model is a [Spectrum](https://github.com/axolotl-ai-cloud/axolotl/blob/67f744dc8c9564ef7a42d5df780ae53e319dca61/src/axolotl/integrations/spectrum/README.md) FFT of [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) on a code translation dataset evolved with [EvolKit](https://github.com/arcee-ai/EvolKit). ## Model description Code translation and completion model trained on Qwen2.5-14B as there is not yet a Qwen2.5-Coder-14B model. This is 100% an alpha completion model thus there will be quirks to it's useage parameters. I will refine the model both for completion and create an instruct/chat variant. ## Intended uses & limitations Differing system prompts for code translation and use as a tab autocomplete model with [continue.dev](https://www.continue.dev/) ## Chat template and sampling paramaters. Chat template is chatml. Sampling parameters for the generation and demo at the hackathon are here: ![image/png](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/YzQ8nqu83lEhl3Kg4u0PC.png) ### SYSTEM PROMPT MUST BE USED FOR THIS MODEL `You are an Al assistant that is an expert at converting code from any language to another within properly formatted code blocks. DON'T SAY ANYTHING ABOUT NOT SEEING CODE. Keep non code text to the a minimum possible. DO NOT REPEAT ANY NON CODE TEXT. ONLY PRINT OUT CODE ONCE DO NOT ITTERATE!` ## Training procedure Spectrum FFT/SFFT ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0005 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 4 - total_train_batch_size: 32 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 50 - num_epochs: 1 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 0.3948 | 0.0237 | 1 | 0.3920 | | 0.2392 | 0.4970 | 21 | 0.2500 | | 0.2606 | 0.9941 | 42 | 0.2621 | ### Framework versions - Transformers 4.45.2 - Pytorch 2.3.1+cu121 - Datasets 3.0.1 - Tokenizers 0.20.1 [Built with Axolotl](https://github.com/axolotl-ai-cloud/axolotl)
See axolotl config axolotl version: `0.4.1` ```yaml base_model: Qwen/Qwen2.5-14B load_in_8bit: false load_in_4bit: false strict: false plugins: - axolotl.integrations.liger.LigerPlugin liger_rope: true liger_rms_norm: true liger_swiglu: true liger_fused_linear_cross_entropy: true plugins: - axolotl.integrations.spectrum.SpectrumPlugin spectrum_top_fraction: 0.5 # Optional if using a pre-scanned model as your base_model. Useful if using a model mirror spectrum_model_name: Qwen/Qwen2.5-14B datasets: - path: datasets/LLaMutation.jsonl type: sharegpt - path: datasets/LLaMutationMAX_Train.json type: sharegpt chat_template: chatml shuffle_merged_datasets: true val_set_size: 0.1 output_dir: ./LLaMutation-Qwen2.5-14B-SFFT-v0.0 sequence_len: 8192 sample_packing: true eval_sample_packing: true pad_to_sequence_len: true # adapter: qlora # lora_model_dir: # lora_r: 32 # lora_alpha: 16 # lora_dropout: 0.05 # lora_target_linear: true # peft_use_dora: true wandb_project: LLaMutation-Qwen2.5-14B-SFFT-v0.0 wandb_entity: wandb_watch: wandb_name: Unit-00 wandb_log_model: gradient_accumulation_steps: 4 micro_batch_size: 1 num_epochs: 1 optimizer: adamw_torch lr_scheduler: linear learning_rate: 0.0005 max_grad_norm: 3 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: true gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: true early_stopping_patience: resume_from_checkpoint: local_rank: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 50 evals_per_epoch: 2 saves_per_epoch: 2 save_safetensors: true hub_model_id: hub_strategy: debug: deepspeed: deepspeed_configs/zero3_bf16.json weight_decay: 0.1 # fsdp: # - full_shard # - auto_wrap # fsdp_config: # fsdp_limit_all_gathers: true # fsdp_sync_module_states: true # fsdp_offload_params: false # Changed from true # fsdp_use_orig_params: true # Changed from false # fsdp_cpu_ram_efficient_loading: true # fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP # fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer # fsdp_activation_checkpointing: true # fsdp_state_dict_type: SHARDED_STATE_DICT # Changed from FULL_STATE_DICT # fsdp_sharding_strategy: FULL_SHARD # fsdp_forward_prefetch: true # Added # fsdp_backward_prefetch: "BACKWARD_POST" # Added # fsdp_backward_prefetch_limit: 1 # Added # fsdp_mixed_precision: BF16 # Added ```