Add files using upload-large-folder tool

a81aace verified 3 months ago

4.36 kB

	---
	library_name: peft
	license: other
	base_model: Qwen/Qwen3-32B
	tags:
	- llama-factory
	- lora
	- generated_from_trainer
	model-index:
	- name: Qwen3-32B-alpaca-th-52k-dolly-th-15k-wangchan-instruct
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Qwen3-32B-alpaca-th-52k-dolly-th-15k-wangchan-instruct

	This model is a fine-tuned version of [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) on the alpaca-th-52k, the dolly-th-15k and the wangchan-instruct datasets.
	It achieves the following results on the evaluation set:
	- Loss: 0.6417

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 32
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 512
	- total_eval_batch_size: 64
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 0.9564 \| 0.0575 \| 10 \| 1.0507 \|
	\| 0.806 \| 0.1149 \| 20 \| 0.8268 \|
	\| 0.7551 \| 0.1724 \| 30 \| 0.7598 \|
	\| 0.7158 \| 0.2299 \| 40 \| 0.7396 \|
	\| 0.7217 \| 0.2874 \| 50 \| 0.7252 \|
	\| 0.7078 \| 0.3448 \| 60 \| 0.7130 \|
	\| 0.6719 \| 0.4023 \| 70 \| 0.7029 \|
	\| 0.6855 \| 0.4598 \| 80 \| 0.6964 \|
	\| 0.7328 \| 0.5172 \| 90 \| 0.6907 \|
	\| 0.6663 \| 0.5747 \| 100 \| 0.6848 \|
	\| 0.7049 \| 0.6322 \| 110 \| 0.6792 \|
	\| 0.6772 \| 0.6897 \| 120 \| 0.6751 \|
	\| 0.687 \| 0.7471 \| 130 \| 0.6721 \|
	\| 0.6786 \| 0.8046 \| 140 \| 0.6700 \|
	\| 0.6389 \| 0.8621 \| 150 \| 0.6672 \|
	\| 0.6673 \| 0.9195 \| 160 \| 0.6649 \|
	\| 0.6711 \| 0.9770 \| 170 \| 0.6633 \|
	\| 0.6614 \| 1.0345 \| 180 \| 0.6615 \|
	\| 0.6219 \| 1.0920 \| 190 \| 0.6602 \|
	\| 0.6542 \| 1.1494 \| 200 \| 0.6587 \|
	\| 0.6596 \| 1.2069 \| 210 \| 0.6572 \|
	\| 0.6526 \| 1.2644 \| 220 \| 0.6567 \|
	\| 0.657 \| 1.3218 \| 230 \| 0.6551 \|
	\| 0.6124 \| 1.3793 \| 240 \| 0.6537 \|
	\| 0.6489 \| 1.4368 \| 250 \| 0.6526 \|
	\| 0.614 \| 1.4943 \| 260 \| 0.6515 \|
	\| 0.656 \| 1.5517 \| 270 \| 0.6504 \|
	\| 0.6255 \| 1.6092 \| 280 \| 0.6492 \|
	\| 0.6419 \| 1.6667 \| 290 \| 0.6486 \|
	\| 0.6275 \| 1.7241 \| 300 \| 0.6473 \|
	\| 0.6324 \| 1.7816 \| 310 \| 0.6466 \|
	\| 0.6334 \| 1.8391 \| 320 \| 0.6461 \|
	\| 0.6213 \| 1.8966 \| 330 \| 0.6452 \|
	\| 0.6269 \| 1.9540 \| 340 \| 0.6443 \|
	\| 0.6408 \| 2.0115 \| 350 \| 0.6437 \|
	\| 0.6213 \| 2.0690 \| 360 \| 0.6441 \|
	\| 0.6146 \| 2.1264 \| 370 \| 0.6440 \|
	\| 0.6572 \| 2.1839 \| 380 \| 0.6438 \|
	\| 0.6264 \| 2.2414 \| 390 \| 0.6435 \|
	\| 0.6051 \| 2.2989 \| 400 \| 0.6434 \|
	\| 0.5983 \| 2.3563 \| 410 \| 0.6429 \|
	\| 0.6388 \| 2.4138 \| 420 \| 0.6425 \|
	\| 0.6227 \| 2.4713 \| 430 \| 0.6425 \|
	\| 0.6335 \| 2.5287 \| 440 \| 0.6421 \|
	\| 0.6247 \| 2.5862 \| 450 \| 0.6420 \|
	\| 0.6404 \| 2.6437 \| 460 \| 0.6418 \|
	\| 0.6218 \| 2.7011 \| 470 \| 0.6418 \|
	\| 0.6368 \| 2.7586 \| 480 \| 0.6417 \|
	\| 0.6191 \| 2.8161 \| 490 \| 0.6417 \|
	\| 0.6234 \| 2.8736 \| 500 \| 0.6417 \|
	\| 0.6079 \| 2.9310 \| 510 \| 0.6417 \|
	\| 0.6243 \| 2.9885 \| 520 \| 0.6417 \|


	### Framework versions

	- PEFT 0.15.2
	- Transformers 4.52.3
	- Pytorch 2.7.0+cu126
	- Datasets 3.6.0
	- Tokenizers 0.21.1

	---
	library_name: peft
	license: other
	base_model: Qwen/Qwen3-32B
	tags:
	- llama-factory
	- lora
	- generated_from_trainer
	model-index:
	- name: Qwen3-32B-alpaca-th-52k-dolly-th-15k-wangchan-instruct
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# Qwen3-32B-alpaca-th-52k-dolly-th-15k-wangchan-instruct

	This model is a fine-tuned version of [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) on the alpaca-th-52k, the dolly-th-15k and the wangchan-instruct datasets.
	It achieves the following results on the evaluation set:
	- Loss: 0.6417

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0002
	- train_batch_size: 2
	- eval_batch_size: 2
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 32
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 512
	- total_eval_batch_size: 64
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 3.0

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 0.9564 \| 0.0575 \| 10 \| 1.0507 \|
	\| 0.806 \| 0.1149 \| 20 \| 0.8268 \|
	\| 0.7551 \| 0.1724 \| 30 \| 0.7598 \|
	\| 0.7158 \| 0.2299 \| 40 \| 0.7396 \|
	\| 0.7217 \| 0.2874 \| 50 \| 0.7252 \|
	\| 0.7078 \| 0.3448 \| 60 \| 0.7130 \|
	\| 0.6719 \| 0.4023 \| 70 \| 0.7029 \|
	\| 0.6855 \| 0.4598 \| 80 \| 0.6964 \|
	\| 0.7328 \| 0.5172 \| 90 \| 0.6907 \|
	\| 0.6663 \| 0.5747 \| 100 \| 0.6848 \|
	\| 0.7049 \| 0.6322 \| 110 \| 0.6792 \|
	\| 0.6772 \| 0.6897 \| 120 \| 0.6751 \|
	\| 0.687 \| 0.7471 \| 130 \| 0.6721 \|
	\| 0.6786 \| 0.8046 \| 140 \| 0.6700 \|
	\| 0.6389 \| 0.8621 \| 150 \| 0.6672 \|
	\| 0.6673 \| 0.9195 \| 160 \| 0.6649 \|
	\| 0.6711 \| 0.9770 \| 170 \| 0.6633 \|
	\| 0.6614 \| 1.0345 \| 180 \| 0.6615 \|
	\| 0.6219 \| 1.0920 \| 190 \| 0.6602 \|
	\| 0.6542 \| 1.1494 \| 200 \| 0.6587 \|
	\| 0.6596 \| 1.2069 \| 210 \| 0.6572 \|
	\| 0.6526 \| 1.2644 \| 220 \| 0.6567 \|
	\| 0.657 \| 1.3218 \| 230 \| 0.6551 \|
	\| 0.6124 \| 1.3793 \| 240 \| 0.6537 \|
	\| 0.6489 \| 1.4368 \| 250 \| 0.6526 \|
	\| 0.614 \| 1.4943 \| 260 \| 0.6515 \|
	\| 0.656 \| 1.5517 \| 270 \| 0.6504 \|
	\| 0.6255 \| 1.6092 \| 280 \| 0.6492 \|
	\| 0.6419 \| 1.6667 \| 290 \| 0.6486 \|
	\| 0.6275 \| 1.7241 \| 300 \| 0.6473 \|
	\| 0.6324 \| 1.7816 \| 310 \| 0.6466 \|
	\| 0.6334 \| 1.8391 \| 320 \| 0.6461 \|
	\| 0.6213 \| 1.8966 \| 330 \| 0.6452 \|
	\| 0.6269 \| 1.9540 \| 340 \| 0.6443 \|
	\| 0.6408 \| 2.0115 \| 350 \| 0.6437 \|
	\| 0.6213 \| 2.0690 \| 360 \| 0.6441 \|
	\| 0.6146 \| 2.1264 \| 370 \| 0.6440 \|
	\| 0.6572 \| 2.1839 \| 380 \| 0.6438 \|
	\| 0.6264 \| 2.2414 \| 390 \| 0.6435 \|
	\| 0.6051 \| 2.2989 \| 400 \| 0.6434 \|
	\| 0.5983 \| 2.3563 \| 410 \| 0.6429 \|
	\| 0.6388 \| 2.4138 \| 420 \| 0.6425 \|
	\| 0.6227 \| 2.4713 \| 430 \| 0.6425 \|
	\| 0.6335 \| 2.5287 \| 440 \| 0.6421 \|
	\| 0.6247 \| 2.5862 \| 450 \| 0.6420 \|
	\| 0.6404 \| 2.6437 \| 460 \| 0.6418 \|
	\| 0.6218 \| 2.7011 \| 470 \| 0.6418 \|
	\| 0.6368 \| 2.7586 \| 480 \| 0.6417 \|
	\| 0.6191 \| 2.8161 \| 490 \| 0.6417 \|
	\| 0.6234 \| 2.8736 \| 500 \| 0.6417 \|
	\| 0.6079 \| 2.9310 \| 510 \| 0.6417 \|
	\| 0.6243 \| 2.9885 \| 520 \| 0.6417 \|


	### Framework versions

	- PEFT 0.15.2
	- Transformers 4.52.3
	- Pytorch 2.7.0+cu126
	- Datasets 3.6.0
	- Tokenizers 0.21.1