Cagatay Demirbas's picture
3 2

Cagatay Demirbas

Cagatayd
·

AI & ML interests

None yet

Recent Activity

updated a model 3 days ago
Cagatayd/Llama3.2-doker
published a model 3 days ago
Cagatayd/Llama3.2-doker
updated a model 10 days ago
Cagatayd/llama3.2-1B-Instruct-Egitim
View all activity

Organizations

None yet

Cagatayd's activity

replied to grimjim's post 4 months ago
view reply

Hi, also some prompts of DPO datasets ends with "\nAnswer:" or "\nOutput:" Should we include it to prompt or not ?

for example:

dataset[0]['prompt'] = ".....where is the capital city of Germany.\nOutput:"

dataset[1]['prompt'] = '......collectively became states of the Commonwealth of Australia."?\nAnswer:'

replied to grimjim's post 4 months ago
view reply

Thanks, that's what I thought, and I'm relieved that you think so too.

replied to grimjim's post 4 months ago
view reply

Hi, I have a question for you, @John6666 mentioned you in the comments of my topic,

In preparing a dataset for DPO (Direct Preference Optimization) training, should the “prompt” be repeated in the “chosen” and “rejected” columns?

I’ve come across some conflicting information regarding the proper formatting of the dataset for DPO training. Some sources suggest that the prompt should be included in both the “chosen” and “rejected” responses to provide full context, while others state that the prompt should be kept separate and not repeated in these columns.

Additionally, when working with multi-turn dialogue data, I’m unsure how to properly format the dataset. Should the “chosen” and “rejected” columns include the entire conversation history up to that point, or just the assistant’s most recent response following the latest user input?

Could someone clarify the correct approach for formatting the dataset? Should the “chosen” and “rejected” columns contain only the assistant’s responses following the prompt, or should they include the prompt as well? And how should I handle multi-turn dialogues in this context?

I also wonder how to prepare multi turn conversation data such as Anthropic/hh-rlhf for DPO

and

Should we add “chosen_rating” and “rejected_rating” into dataset ?

Thanks in advance

New activity in meta-llama/Llama-3.1-8B 5 months ago
New activity in mistralai/Mistral-7B-Instruct-v0.1 5 months ago

Fine-tuning dataset template

14
#98 opened 12 months ago by
Lalith16