nvidia/Llama-3_3-Nemotron-Super-49B-v1 · Seeking Advice on Fine-Tuning for Domain Reasoning Tasks

Mar 22

Thanks for open-sourcing this, I already asked this question on 8B model card, asking again. I'm currently working on adapting it for domain-specific reasoning. Could you advise on what the dataset should look like for this purpose?

Is it sufficient to use an Alpaca-format dataset
({"instruction: "", "input" : "", "output" : ""}) or would I need reasoning traces for effective fine-tuning? Also, would you recommend QLoRA or SFT for this task?

Any tips or best practices would be greatly appreciated!

okuchaiev

NVIDIA org Mar 25

I'd recommend creating data in the format like what we've used for post-training: https://huggingface.co/datasets/nvidia/Llama-Nemotron-Post-Training-Dataset-v1

Don't forget to include "detailed thinking on/off" in your examples' system prompts, depending on the context.

ugaoo

Mar 26

This comment has been hidden (marked as Off-Topic)

aaditya

Mar 26

@okuchaiev Thank you for the response! Could you please share a bit more detail or guidance regarding the fine-tuning reasoning model process?

I plan to create data in the same format as the one you've shared, including the "detailed thinking on/off" toggle in the system prompts, as appropriate for the context.
Which fine-tuning method did the team use, full fine-tuning, LoRA, or QLoRA? I'm planning to use QLoRA; would that be suitable for this use case?
Do you have any recommended hyperparameters for this model fine-tuning? For example, learning rate, LoRA rank, or alpha values?

abiteddie

14 days ago

Would be also interested in finetuning this model. Maybe some more details would be nice 🫶