gohsyi
·
AI & ML interests
None yet
Organizations
None yet
gohsyi/Meta-Llama-3.1-8B-Instruct-rm-ultrafeedback
8B
•
Updated
•
20
gohsyi/gemma-2-2b-sft-ultrafeedback
3B
•
Updated
•
38
gohsyi/gemma-2-2b-it-dpo-ultrafeedback
3B
•
Updated
•
19
gohsyi/gemma-2-2b-dpo-ultrafeedback
3B
•
Updated
•
8
gohsyi/gemma-2-2b-it-sft-ultrafeedback
3B
•
Updated
•
8
gohsyi/gemma-2-2b-ppo4-rwt-metamath-v0.1
Updated
gohsyi/gemma-2-2b-ppo4-metamath-v0.1
Updated
gohsyi/gemma-2-2b-sft-metamath
3B
•
Updated
•
11
•
2
gohsyi/gemma-2-2b-it-rm-ultrafeedback
3B
•
Updated
•
7
gohsyi/gemma-2-2b-ppo4-offline-ultrafeedback-v0.1
3B
•
Updated
•
8
gohsyi/gemma-2-2b-ppo4-rwt-offline-ultrafeedback-v0.1
3B
•
Updated
•
8
gohsyi/gemma-2-2b-ppo4-rwt-ultrafeedback-v0.1
3B
•
Updated
•
54
gohsyi/gemma-2-2b-ppo4-ultrafeedback-v0.1
3B
•
Updated
•
8
gohsyi/gemma-2-2b-sft
3B
•
Updated
•
9
gohsyi/gemma-2-2b-sft-mixture
3B
•
Updated
•
35
gohsyi/gemma-2-2b-rm-ultrafeedback
3B
•
Updated
•
9
gohsyi/gemma-2-2b-ppo-saferlhf-iter1-rwt-v0.1
3B
•
Updated
•
9
gohsyi/gemma-2-2b-ppo-saferlhf-iter1-v0.1
3B
•
Updated
•
8
gohsyi/gemma-2-2b-rm-saferlhf
3B
•
Updated
•
13
gohsyi/iterative-prompt-v1-iter1-20K-reweighted
Updated
gohsyi/Llama-3-8B-SFT
8B
•
Updated
•
7
gohsyi/Llama-3-8b-rlhf-iter1-reweighted-v0.2
Text Generation
•
8B
•
Updated
•
15
gohsyi/Llama-3-8b-rlhf-iter3-reweighted-v0.1
Text Generation
•
8B
•
Updated
•
15
gohsyi/Llama-3-8b-rlhf-iter2-reweighted-v0.1
Text Generation
•
8B
•
Updated
•
15
gohsyi/Llama-3-8b-rlhf-iter3-v0.1
Text Generation
•
8B
•
Updated
•
53
gohsyi/Llama-3-8b-rlhf-iter2-v0.1
Text Generation
•
8B
•
Updated
•
26
gohsyi/Llama-3-8b-rlhf-iter1-v0.1
Text Generation
•
8B
•
Updated
•
15
gohsyi/Llama-3-8b-rlhf-iter1-reweighted-v0.1
Text Generation
•
8B
•
Updated
•
74
gohsyi/Llama-3-8b-rlhf-iter1-threshold-v0.1
Text Generation
•
8B
•
Updated
•
75