Xiaoyang Cao's picture

1

Xiaoyang Cao

Sean13

·

https://xiaoyangcao1113.github.io/

AI & ML interests

RLFH, Deep Reinfrocement Learning

Recent Activity

published a model about 1 hour ago

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.9

updated a model about 1 hour ago

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.7

published a model about 11 hours ago

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.7

View all activity

Organizations

None yet

models 14

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.9

Updated about 1 hour ago

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.7

7B • Updated about 1 hour ago

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.5

Updated about 11 hours ago

Sean13/mistral-7b-instruct-v0.2-rdpo-full-alpha0.3

Updated about 14 hours ago

Sean13/mistral-7b-instruct-v0.2-rcpo-full

Text Generation • 7B • Updated 5 days ago • 42

Sean13/mistral-7b-instruct-v0.2-cpo-full

Text Generation • 7B • Updated 9 days ago • 51

Sean13/mistral-7b-instruct-v0.2-simpo-full

Text Generation • 7B • Updated 13 days ago • 14

Sean13/mistral-7b-instruct-v0.2-rsimpo-full

Text Generation • 7B • Updated 14 days ago • 18

Sean13/mistral-7b-instruct-v0.2-ipo-full

Text Generation • 7B • Updated Aug 19 • 5

Sean13/mistral-7b-instruct-v0.2-slic_hf-full

Text Generation • 7B • Updated Aug 11 • 5

datasets 0

None public yet