deshanksuman's picture
Update README.md
986f3b5 verified
metadata
base_model:
  - Qwen/Qwen3-4B
tags:
  - text-generation-inference
  - transformers
  - trl
  - qwen
  - wsd
  - ambiguity
license: apache-2.0
language:
  - en
datasets:
  - deshanksuman/Reasoning_WSD_dataset
pipeline_tag: text-classification

Uploaded model

  • Developed by: deshanksuman
  • License: apache-2.0
  • Finetuned from model : Qwen/Qwen3-4B

Dataset

Fews Training data arranged in the format of Instruction, Input and output with advanced Reasonining for sense identification. The data generation has been semi automated using the Arcee models. The data has been validated by the human for it's structure and the correctnes.

The data source can be accessed here: deshanksuman/Reasoning_WSD_dataset

Hyperparameter for Training

  • per_device_train_batch_size=4,
  • gradient_accumulation_steps=8,
  • warmup_steps=50,
  • num_train_epochs=2,
  • learning_rate=2e-4,
  • fp16=not torch.cuda.is_bf16_supported(),
  • bf16=torch.cuda.is_bf16_supported(),
  • logging_steps=10,
  • optim="adamw_torch",
  • weight_decay=0.01,
  • lr_scheduler_type="linear",
  • seed=3407

This is developed by Deshan Sumanathilaka https://sumanathilaka.github.io

Acknowledgement

We acknowledge the support of the Supercomputing Wales project, which is part-funded by the European Regional Development Fund (ERDF) via Welsh Government.