Uploaded model
- Developed by: deshanksuman
- License: apache-2.0
- Finetuned from model : Qwen/Qwen3-4B
Dataset
Fews Training data arranged in the format of Instruction, Input and output with advanced Reasonining for sense identification. The data generation has been semi automated using the Arcee models. The data has been validated by the human for it's structure and the correctnes.
The data source can be accessed here: deshanksuman/Reasoning_WSD_dataset
Hyperparameter for Training
- per_device_train_batch_size=4,
- gradient_accumulation_steps=8,
- warmup_steps=50,
- num_train_epochs=2,
- learning_rate=2e-4,
- fp16=not torch.cuda.is_bf16_supported(),
- bf16=torch.cuda.is_bf16_supported(),
- logging_steps=10,
- optim="adamw_torch",
- weight_decay=0.01,
- lr_scheduler_type="linear",
- seed=3407
This is developed by Deshan Sumanathilaka https://sumanathilaka.github.io
Acknowledgement
We acknowledge the support of the Supercomputing Wales project, which is part-funded by the European Regional Development Fund (ERDF) via Welsh Government.