qwen3-0.6b-vericava-posts-v1

This is a model trained from scratch, using parameters of Qwen/Qwen3-0.6B-FP8 on a dataset of my posts on the Internet.

It achieves the following results on the evaluation set:

Model description

It generates text resembling what I post on the Internet.

CAUTION: It may produce something I'd never say.

I do not impose any restriction(s) on the use of this model.

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0005
train_batch_size: 128
eval_batch_size: 128
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 1024
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 1000
num_epochs: 100

Training Loss	Epoch	Step	Validation Loss
2.4845	11.1231	100	7.7641
1.7	22.2462	200	6.2579
1.4179	33.3692	300	5.6225
1.2521	44.4923	400	5.4497
1.0905	55.6154	500	5.5389
0.8382	66.7385	600	5.9830
0.5511	77.8615	700	6.3376
0.3364	88.9846	800	6.5791
0.2083	100.0	900	6.8017