gupta-tanish/Ultrafeedback-gemma2-9b-it-top1vsbottom3-selection Viewer • Updated 27 days ago • 9.48k • 105
gupta-tanish/Ultrafeedback-llama3-8b-instruct-v0.2-on-policy-clean-8-binned-data Viewer • Updated Jul 25 • 60.8k • 26
gupta-tanish/Ultrafeedback-llama3-8b-instruct-v0.2-on-policy-clean-4-binned-data Viewer • Updated Jul 25 • 60.8k • 12
gupta-tanish/Ultrafeedback-llama3-8b-instruct-v0.2-on-policy-clean-2-binned-data Viewer • Updated Jul 25 • 60.8k • 14
gupta-tanish/QwQ-Long-CoT-30k-subset-Llama3.1-8B-dynamic-perturbation-regex-generation-max-margin-logp-10 Viewer • Updated Jul 19 • 59k • 7
gupta-tanish/QwQ-Long-CoT-30k-subset-Llama3.1-8B-dynamic-perturbation-regex-generation-max-margin Viewer • Updated Jul 19 • 107k • 10
gupta-tanish/QwQ-Long-CoT-10k-subset-llama3.1-8b-Inst-GPT4-Step-Perturbation-8-rejects Viewer • Updated Jul 18 • 42.3k • 7
gupta-tanish/QwQ-Long-CoT-20k-subset-Llama3.1-8B-Instruct-on-policy-step-wise-correct-trajectory Updated Jul 18 • 10
gupta-tanish/QwQ-Long-CoT-30k-subset-Llama3.1-8B-Instruct-on-policy-step-wise-correct-trajectory Updated Jul 18 • 17
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-dynamic-perturbation-regex-generation-max-margin-10 Viewer • Updated Jul 13 • 19.2k • 8
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-iter2-dynamic-perturbation-regex-generation-max-margin Viewer • Updated Jul 12 • 19k • 7
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-Instruct-on-policy-step-wise-correct-trajectory-iter2 Viewer • Updated Jul 12 • 23.5k • 12
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-dynamic-perturbation-regex-generation-max-margin-complete Viewer • Updated Jul 8 • 37.3k • 10
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-Instruct-on-policy-step-wise-correct-trajectory Viewer • Updated Jul 7 • 44.5k • 8
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-dynamic-perturbation-augmented-3-regex-generation-max-margin Viewer • Updated Jul 7 • 37.2k • 8
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-dynamic-perturbation-augmented-regex-generation-max-margin Viewer • Updated Jul 7 • 37.2k • 6
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-dynamic-perturbation-regex-generation-max-margin Viewer • Updated Jul 7 • 36.9k • 6
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-dynamic-perturbation-regex-generation-max-margin-top-k-2 Viewer • Updated Jul 6 • 66.4k • 6
gupta-tanish/Filtered-QwQ-Long-CoT-10k-subset-Llama3.1-8B-Instruct-model-pertubation-generation-logps-10 Viewer • Updated Jul 6 • 27.7k • 7
gupta-tanish/QwQ-Long-CoT-15k-subset-Llama3.1-8B-single-position-regex-perturbations-logps-12 Viewer • Updated Jul 4 • 53.8k • 14
gupta-tanish/QwQ-Long-CoT-15k-subset-Llama3.1-8B-single-position-regex-perturbations-logps-15 Viewer • Updated Jul 4 • 117k • 9
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-single-position-regex-perturbations-logps-10 Viewer • Updated Jul 4 • 27.7k • 7
gupta-tanish/QwQ-Long-CoT-10k-subset-Llama3.1-8B-single-position-regex-perturbation Viewer • Updated Jul 3 • 204k • 10