Seeking Advice on Fine-Tuning for Domain Reasoning Tasks

#4
by aaditya - opened

Thanks for open-sourcing this, I already asked this question on 8B model card, asking again. I'm currently working on adapting it for domain-specific reasoning. Could you advise on what the dataset should look like for this purpose?

Is it sufficient to use an Alpaca-format dataset
({"instruction: "", "input" : "", "output" : ""}) or would I need reasoning traces for effective fine-tuning? Also, would you recommend QLoRA or SFT for this task?

Any tips or best practices would be greatly appreciated!

NVIDIA org

I'd recommend creating data in the format like what we've used for post-training: https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset-v1

Don't forget to include "detailed thinking on/off" in your examples' system prompts, depending on the context.

This comment has been hidden (marked as Off-Topic)

@okuchaiev Thank you for the response! Could you please share a bit more detail or guidance regarding the fine-tuning reasoning model process?

  • I plan to create data in the same format as the one you've shared, including the "detailed thinking on/off" toggle in the system prompts, as appropriate for the context.
  • Which fine-tuning method did the team use, full fine-tuning, LoRA, or QLoRA? I'm planning to use QLoRA; would that be suitable for this use case?
  • Do you have any recommended hyperparameters for this model fine-tuning? For example, learning rate, LoRA rank, or alpha values?
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment