Update README.md

a29cf36 verified about 1 year ago

5.24 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: Qwen/Qwen2.5-14B
	model-index:
	- name: LLaMutation-Qwen2.5-14B-SFFT-v0.0
	results: []
	---

	# LLaMutation-Qwen2.5-14B-SFFT-v0.0

	![image/webp](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/IFK02cTih72zfZfT5UY4f.webp)

	This model is a [Spectrum](https://github.com/axolotl-ai-cloud/axolotl/blob/67f744dc8c9564ef7a42d5df780ae53e319dca61/src/axolotl/integrations/spectrum/README.md) FFT of [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) on a code translation dataset evolved with [EvolKit](https://github.com/arcee-ai/EvolKit).

	## Model description

	Code translation and completion model trained on Qwen2.5-14B as there is not yet a Qwen2.5-Coder-14B model. This is 100% an alpha completion model thus there will be quirks to it's useage parameters.

	I will refine the model both for completion and create an instruct/chat variant.

	## Intended uses & limitations

	Differing system prompts for code translation and use as a tab autocomplete model with [continue.dev](https://www.continue.dev/)

	## Chat template and sampling paramaters.

	Chat template is chatml.

	Sampling parameters for the generation and demo at the hackathon are here:

	![image/png](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/YzQ8nqu83lEhl3Kg4u0PC.png)

	### SYSTEM PROMPT MUST BE USED FOR THIS MODEL

	`You are an Al assistant that is an expert at converting code from any language to another within properly formatted code blocks. DON'T SAY ANYTHING ABOUT NOT SEEING CODE. Keep non code text to the a minimum possible. DO NOT REPEAT ANY NON CODE TEXT. ONLY PRINT OUT CODE ONCE DO NOT ITTERATE!`

	## Training procedure

	Spectrum FFT/SFFT

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 32
	- total_eval_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 50
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 0.3948 \| 0.0237 \| 1 \| 0.3920 \|
	\| 0.2392 \| 0.4970 \| 21 \| 0.2500 \|
	\| 0.2606 \| 0.9941 \| 42 \| 0.2621 \|


	### Framework versions

	- Transformers 4.45.2
	- Pytorch 2.3.1+cu121
	- Datasets 3.0.1
	- Tokenizers 0.20.1

	[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.4.1`
	```yaml
	base_model: Qwen/Qwen2.5-14B

	load_in_8bit: false
	load_in_4bit: false
	strict: false

	plugins:
	- axolotl.integrations.liger.LigerPlugin
	liger_rope: true
	liger_rms_norm: true
	liger_swiglu: true
	liger_fused_linear_cross_entropy: true

	plugins:
	- axolotl.integrations.spectrum.SpectrumPlugin

	spectrum_top_fraction: 0.5
	# Optional if using a pre-scanned model as your base_model. Useful if using a model mirror
	spectrum_model_name: Qwen/Qwen2.5-14B

	datasets:
	- path: datasets/LLaMutation.jsonl
	type: sharegpt
	- path: datasets/LLaMutationMAX_Train.json
	type: sharegpt

	chat_template: chatml
	shuffle_merged_datasets: true
	val_set_size: 0.1
	output_dir: ./LLaMutation-Qwen2.5-14B-SFFT-v0.0

	sequence_len: 8192
	sample_packing: true
	eval_sample_packing: true
	pad_to_sequence_len: true

	# adapter: qlora
	# lora_model_dir:
	# lora_r: 32
	# lora_alpha: 16
	# lora_dropout: 0.05
	# lora_target_linear: true
	# peft_use_dora: true

	wandb_project: LLaMutation-Qwen2.5-14B-SFFT-v0.0
	wandb_entity:
	wandb_watch:
	wandb_name: Unit-00
	wandb_log_model:

	gradient_accumulation_steps: 4
	micro_batch_size: 1
	num_epochs: 1
	optimizer: adamw_torch
	lr_scheduler: linear
	learning_rate: 0.0005
	max_grad_norm: 3

	train_on_inputs: false
	group_by_length: false
	bf16: auto
	fp16:
	tf32: true

	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: true
	early_stopping_patience:
	resume_from_checkpoint:
	local_rank:
	logging_steps: 1
	xformers_attention:
	flash_attention: true

	warmup_steps: 50
	evals_per_epoch: 2
	saves_per_epoch: 2
	save_safetensors: true
	hub_model_id:
	hub_strategy:
	debug:
	deepspeed: deepspeed_configs/zero3_bf16.json
	weight_decay: 0.1
	# fsdp:
	# - full_shard
	# - auto_wrap
	# fsdp_config:
	# fsdp_limit_all_gathers: true
	# fsdp_sync_module_states: true
	# fsdp_offload_params: false # Changed from true
	# fsdp_use_orig_params: true # Changed from false
	# fsdp_cpu_ram_efficient_loading: true
	# fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP
	# fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer
	# fsdp_activation_checkpointing: true
	# fsdp_state_dict_type: SHARDED_STATE_DICT # Changed from FULL_STATE_DICT
	# fsdp_sharding_strategy: FULL_SHARD
	# fsdp_forward_prefetch: true # Added
	# fsdp_backward_prefetch: "BACKWARD_POST" # Added
	# fsdp_backward_prefetch_limit: 1 # Added
	# fsdp_mixed_precision: BF16 # Added
	```

	</details><br>