|
--- |
|
library_name: transformers |
|
license: apache-2.0 |
|
base_model: Qwen/Qwen2.5-14B |
|
model-index: |
|
- name: LLaMutation-Qwen2.5-14B-SFFT-v0.0 |
|
results: [] |
|
--- |
|
|
|
# LLaMutation-Qwen2.5-14B-SFFT-v0.0 |
|
|
|
![image/webp](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/IFK02cTih72zfZfT5UY4f.webp) |
|
|
|
This model is a [Spectrum](https://github.com/axolotl-ai-cloud/axolotl/blob/67f744dc8c9564ef7a42d5df780ae53e319dca61/src/axolotl/integrations/spectrum/README.md) FFT of [Qwen/Qwen2.5-14B](https://huggingface.co/Qwen/Qwen2.5-14B) on a code translation dataset evolved with [EvolKit](https://github.com/arcee-ai/EvolKit). |
|
|
|
## Model description |
|
|
|
Code translation and completion model trained on Qwen2.5-14B as there is not yet a Qwen2.5-Coder-14B model. This is 100% an alpha completion model thus there will be quirks to it's useage parameters. |
|
|
|
I will refine the model both for completion and create an instruct/chat variant. |
|
|
|
## Intended uses & limitations |
|
|
|
Differing system prompts for code translation and use as a tab autocomplete model with [continue.dev](https://www.continue.dev/) |
|
|
|
## Chat template and sampling paramaters. |
|
|
|
Chat template is chatml. |
|
|
|
Sampling parameters for the generation and demo at the hackathon are here: |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/655dc641accde1bbc8b41aec/YzQ8nqu83lEhl3Kg4u0PC.png) |
|
|
|
### SYSTEM PROMPT MUST BE USED FOR THIS MODEL |
|
|
|
`You are an Al assistant that is an expert at converting code from any language to another within properly formatted code blocks. DON'T SAY ANYTHING ABOUT NOT SEEING CODE. Keep non code text to the a minimum possible. DO NOT REPEAT ANY NON CODE TEXT. ONLY PRINT OUT CODE ONCE DO NOT ITTERATE!` |
|
|
|
## Training procedure |
|
|
|
Spectrum FFT/SFFT |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.0005 |
|
- train_batch_size: 1 |
|
- eval_batch_size: 1 |
|
- seed: 42 |
|
- distributed_type: multi-GPU |
|
- num_devices: 8 |
|
- gradient_accumulation_steps: 4 |
|
- total_train_batch_size: 32 |
|
- total_eval_batch_size: 8 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- lr_scheduler_warmup_steps: 50 |
|
- num_epochs: 1 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|:-------------:|:------:|:----:|:---------------:| |
|
| 0.3948 | 0.0237 | 1 | 0.3920 | |
|
| 0.2392 | 0.4970 | 21 | 0.2500 | |
|
| 0.2606 | 0.9941 | 42 | 0.2621 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.45.2 |
|
- Pytorch 2.3.1+cu121 |
|
- Datasets 3.0.1 |
|
- Tokenizers 0.20.1 |
|
|
|
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl) |
|
<details><summary>See axolotl config</summary> |
|
|
|
axolotl version: `0.4.1` |
|
```yaml |
|
base_model: Qwen/Qwen2.5-14B |
|
|
|
load_in_8bit: false |
|
load_in_4bit: false |
|
strict: false |
|
|
|
plugins: |
|
- axolotl.integrations.liger.LigerPlugin |
|
liger_rope: true |
|
liger_rms_norm: true |
|
liger_swiglu: true |
|
liger_fused_linear_cross_entropy: true |
|
|
|
plugins: |
|
- axolotl.integrations.spectrum.SpectrumPlugin |
|
|
|
spectrum_top_fraction: 0.5 |
|
# Optional if using a pre-scanned model as your base_model. Useful if using a model mirror |
|
spectrum_model_name: Qwen/Qwen2.5-14B |
|
|
|
datasets: |
|
- path: datasets/LLaMutation.jsonl |
|
type: sharegpt |
|
- path: datasets/LLaMutationMAX_Train.json |
|
type: sharegpt |
|
|
|
chat_template: chatml |
|
shuffle_merged_datasets: true |
|
val_set_size: 0.1 |
|
output_dir: ./LLaMutation-Qwen2.5-14B-SFFT-v0.0 |
|
|
|
sequence_len: 8192 |
|
sample_packing: true |
|
eval_sample_packing: true |
|
pad_to_sequence_len: true |
|
|
|
# adapter: qlora |
|
# lora_model_dir: |
|
# lora_r: 32 |
|
# lora_alpha: 16 |
|
# lora_dropout: 0.05 |
|
# lora_target_linear: true |
|
# peft_use_dora: true |
|
|
|
wandb_project: LLaMutation-Qwen2.5-14B-SFFT-v0.0 |
|
wandb_entity: |
|
wandb_watch: |
|
wandb_name: Unit-00 |
|
wandb_log_model: |
|
|
|
gradient_accumulation_steps: 4 |
|
micro_batch_size: 1 |
|
num_epochs: 1 |
|
optimizer: adamw_torch |
|
lr_scheduler: linear |
|
learning_rate: 0.0005 |
|
max_grad_norm: 3 |
|
|
|
train_on_inputs: false |
|
group_by_length: false |
|
bf16: auto |
|
fp16: |
|
tf32: true |
|
|
|
gradient_checkpointing: true |
|
gradient_checkpointing_kwargs: |
|
use_reentrant: true |
|
early_stopping_patience: |
|
resume_from_checkpoint: |
|
local_rank: |
|
logging_steps: 1 |
|
xformers_attention: |
|
flash_attention: true |
|
|
|
warmup_steps: 50 |
|
evals_per_epoch: 2 |
|
saves_per_epoch: 2 |
|
save_safetensors: true |
|
hub_model_id: |
|
hub_strategy: |
|
debug: |
|
deepspeed: deepspeed_configs/zero3_bf16.json |
|
weight_decay: 0.1 |
|
# fsdp: |
|
# - full_shard |
|
# - auto_wrap |
|
# fsdp_config: |
|
# fsdp_limit_all_gathers: true |
|
# fsdp_sync_module_states: true |
|
# fsdp_offload_params: false # Changed from true |
|
# fsdp_use_orig_params: true # Changed from false |
|
# fsdp_cpu_ram_efficient_loading: true |
|
# fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP |
|
# fsdp_transformer_layer_cls_to_wrap: Qwen2DecoderLayer |
|
# fsdp_activation_checkpointing: true |
|
# fsdp_state_dict_type: SHARDED_STATE_DICT # Changed from FULL_STATE_DICT |
|
# fsdp_sharding_strategy: FULL_SHARD |
|
# fsdp_forward_prefetch: true # Added |
|
# fsdp_backward_prefetch: "BACKWARD_POST" # Added |
|
# fsdp_backward_prefetch_limit: 1 # Added |
|
# fsdp_mixed_precision: BF16 # Added |
|
``` |
|
|
|
</details><br> |