Uploaded model

  • Developed by: deshanksuman
  • License: apache-2.0
  • Finetuned from model : Qwen/Qwen3-4B

Dataset

Fews Training data arranged in the format of Instruction, Input and output with advanced Reasonining for sense identification. The data generation has been semi automated using the Arcee models. The data has been validated by the human for it's structure and the correctnes.

The data source can be accessed here: deshanksuman/Reasoning_WSD_dataset

Hyperparameter for Training

  • per_device_train_batch_size=4,
  • gradient_accumulation_steps=8,
  • warmup_steps=50,
  • num_train_epochs=2,
  • learning_rate=2e-4,
  • fp16=not torch.cuda.is_bf16_supported(),
  • bf16=torch.cuda.is_bf16_supported(),
  • logging_steps=10,
  • optim="adamw_torch",
  • weight_decay=0.01,
  • lr_scheduler_type="linear",
  • seed=3407

This is developed by Deshan Sumanathilaka https://sumanathilaka.github.io

Acknowledgement

We acknowledge the support of the Supercomputing Wales project, which is part-funded by the European Regional Development Fund (ERDF) via Welsh Government.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for deshanksuman/finetunedQwen3-4B-Instruct-WSD-Advanced-reasoning

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Finetuned
(277)
this model

Dataset used to train deshanksuman/finetunedQwen3-4B-Instruct-WSD-Advanced-reasoning