qwq-32b-lora-creed / README.md

QwQ-32B LoRA fine-tuned on phxdev/creed dataset

968f5a8 verified about 2 months ago

6.06 kB

	---
	library_name: peft
	license: apache-2.0
	base_model: Qwen/QwQ-32B-Preview
	tags:
	- generated_from_trainer
	datasets:
	- phxdev/creed
	model-index:
	- name: outputs/heisenberg-crystal
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
	<details><summary>See axolotl config</summary>

	axolotl version: `0.8.0.dev0`
	```yaml
	adapter: lora
	base_model: Qwen/QwQ-32B-Preview
	trust_remote_code: true
	bf16: true
	dataset_processes: 64
	datasets:
	- path: phxdev/creed
	type: completion
	field: text
	trust_remote_code: false
	streaming: true
	gradient_accumulation_steps: 1
	gradient_checkpointing: true
	gradient_checkpointing_kwargs:
	use_reentrant: false
	learning_rate: 0.001
	lisa_layers_attribute: model.layers
	lisa_enabled: true
	lisa_layers_fraction: 0.25
	load_best_model_at_end: true
	load_in_4bit: false
	load_in_8bit: true
	lora_alpha: 128
	lora_dropout: 0.15
	lora_r: 64
	lora_target_modules:
	- q_proj
	- v_proj
	- k_proj
	- o_proj
	- gate_proj
	- down_proj
	- up_proj
	lora_fan_in_fan_out: false
	modules_to_save:
	- embed_tokens
	- lm_head
	loraplus_lr_embedding: 1.0e-06
	loraplus_lr_ratio: 16
	lr_scheduler: cosine_with_min_lr
	lr_scheduler_kwargs:
	min_lr: 0.00001
	max_prompt_len: 1024
	mean_resizing_embeddings: false
	micro_batch_size: 1
	num_epochs: 3.0
	optimizer: adamw_torch
	# optim_args:
	# weight_decay: 0.05
	# betas: [0.9, 0.95]
	# eps: 1.0e-8
	output_dir: ./outputs/heisenberg-crystal
	pretrain_multipack_attn: true
	pretrain_multipack_buffer_size: 20000
	qlora_sharded_model_loading: false
	ray_num_workers: 1
	resources_per_worker:
	GPU: 1
	resume_from_checkpoint: null
	sample_packing: false
	sample_packing_bin_size: 200
	sample_packing_group_size: 100000
	sample_packing_seq_len_multiplier: 1.0
	save_only_model: true
	save_safetensors: true
	save_strategy: steps
	save_steps: 100
	save_total_limit: 3
	eval_strategy: steps
	eval_steps: 100
	metric_for_best_model: loss
	greater_is_better: false
	sequence_len: 512
	shuffle_merged_datasets: true
	skip_prepare_dataset: false
	strict: false
	train_on_inputs: false
	neftune_noise_alpha: 5.0
	model_config:
	rope_scaling:
	type: linear
	factor: 1.5
	dataloader_prefetch_factor: 4
	dataloader_num_workers: 8
	dataloader_pin_memory: true
	dataloader_persistent_workers: true
	max_grad_norm: 1.0
	adam_beta2_schedule: cosine
	torch_compile: true
	torch_compile_backend: inductor
	trl:
	log_completions: true
	ref_model_mixup_alpha: 0.9
	ref_model_sync_steps: 64
	sync_ref_model: false
	use_vllm: false
	vllm_device: auto
	vllm_dtype: auto
	vllm_gpu_memory_utilization: 0.9
	use_ray: false
	val_set_size: 0.05
	warmup_steps: 100
	warmup_ratio: 0.0
	weight_decay: 0.05
	flash_attention: true
	flash_attn_cross_entropy: true
	flash_attn_rms_norm: true
	flash_attn_fuse_qkv: false
	flash_attn_fuse_mlp: false
	ddp_backend: nccl
	ddp_broadcast_buffers: false
	ddp_find_unused_parameters: false
	tf32: true
	bf16_full_eval: false
	fp16: false
	# unfrozen_parameters:
	# - lm_head.*
	# - embed_tokens.*
	# - norm.*
	xformers_attention: false
	s2_attention: false
	sdp_attention: false
	pad_to_sequence_len: true
	peft_use_dora: false
	peft_lora_modules_to_save: null
	special_tokens:
	pad_token: <\|endoftext\|>
	deepspeed: null
	fsdp: null
	fsdp_config: null
	# wandb_project: heisenberg-qwen
	# wandb_entity: null
	# wandb_name: blue-crystal-run
	# wandb_log_model: checkpoint
	hub_model_id: null
	hub_strategy: null
	report_to: []
	logging_strategy: steps
	logging_steps: 10
	logging_first_step: true
	```

	</details><br>

	# outputs/heisenberg-crystal

	This model is a fine-tuned version of [Qwen/QwQ-32B-Preview](https://huggingface.co/Qwen/QwQ-32B-Preview) on the phxdev/creed dataset.
	It achieves the following results on the evaluation set:
	- Loss: nan

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.001
	- train_batch_size: 1
	- eval_batch_size: 1
	- seed: 42
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine_with_min_lr
	- lr_scheduler_warmup_steps: 100
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| No log \| 0.0013 \| 1 \| nan \|
	\| 7.8286 \| 0.1259 \| 100 \| nan \|
	\| 7.2486 \| 0.2519 \| 200 \| nan \|
	\| 7.2601 \| 0.3778 \| 300 \| nan \|
	\| 8.2142 \| 0.5038 \| 400 \| nan \|
	\| 7.1902 \| 0.6297 \| 500 \| nan \|
	\| 6.3799 \| 0.7557 \| 600 \| nan \|
	\| 6.7115 \| 0.8816 \| 700 \| nan \|
	\| 6.0414 \| 1.0076 \| 800 \| nan \|
	\| 6.428 \| 1.1335 \| 900 \| nan \|
	\| 6.3167 \| 1.2594 \| 1000 \| nan \|
	\| 6.0359 \| 1.3854 \| 1100 \| nan \|
	\| 6.3701 \| 1.5113 \| 1200 \| nan \|
	\| 6.9225 \| 1.6373 \| 1300 \| nan \|
	\| 6.5807 \| 1.7632 \| 1400 \| nan \|
	\| 6.8649 \| 1.8892 \| 1500 \| nan \|
	\| 6.1397 \| 2.0151 \| 1600 \| nan \|
	\| 5.7675 \| 2.1411 \| 1700 \| nan \|
	\| 6.2605 \| 2.2670 \| 1800 \| nan \|
	\| 5.8788 \| 2.3929 \| 1900 \| nan \|
	\| 6.0279 \| 2.5189 \| 2000 \| nan \|
	\| 6.3911 \| 2.6448 \| 2100 \| nan \|
	\| 6.0412 \| 2.7708 \| 2200 \| nan \|
	\| 6.0862 \| 2.8967 \| 2300 \| nan \|


	### Framework versions

	- PEFT 0.14.0
	- Transformers 4.49.0
	- Pytorch 2.5.1+cu124
	- Datasets 3.2.0
	- Tokenizers 0.21.0