Wei Xiong's picture

Wei Xiong

weqweasdas

·

https://weixiongust.github.io/WeiXiongUST/index.html

AI & ML interests

Machine learning, RLHF

Recent Activity

updated a dataset about 8 hours ago

weqweasdas/numina_prompt_non_dedu

published a dataset about 8 hours ago

weqweasdas/numina_prompt_non_dedu

upvoted a paper 10 days ago

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

View all activity

Organizations

weqweasdas's activity

liked a model 5 months ago

RLHFlow/Llama3.1-8B-PRM-Deepseek-Data

Text Generation • Updated May 10 • 21.9k • • 35

liked a dataset 7 months ago

RLHFlow/RLHFlow-SFT-Dataset-ver2

Viewer • Updated Nov 2, 2024 • 2.32M • 73 • 5

liked a model 7 months ago

RLHFlow/Llama3.1-8B-PRM-Mistral-Data

Text Generation • Updated Nov 9, 2024 • 1.68k • • 10

liked 2 models 10 months ago

NCSOFT/Llama-3-OffsetBias-RM-8B

Text Classification • Updated Sep 6, 2024 • 424 • 23

RLHFlow/LLaMA3-SFT

Text Generation • Updated Nov 3, 2024 • 6.55k • 10

liked 9 models about 1 year ago

RLHFlow/LLaMA3-iterative-DPO-final

Text Generation • Updated Oct 14, 2024 • 3.29k • 41

RLHFlow/ArmoRM-Llama3-8B-v0.1

Text Classification • Updated Sep 23, 2024 • 19.8k • 178

RLHFlow/pair-preference-model-LLaMA3-8B

Text Generation • Updated Oct 14, 2024 • 1.52k • 38

Salesforce/LLaMA-3-8B-SFR-RM-R

Text Classification • Updated Jan 21 • 14 • 11

Salesforce/LLaMA-3-8B-SFR-SFT-R

Text Generation • Updated Jan 21 • 61 • 8

Salesforce/LLaMA-3-8B-SFR-Iterative-DPO-R

Text Generation • Updated Jan 21 • 1.46k • 77

sfairXC/FsfairX-LLaMA3-RM-v0.1

Text Classification • Updated Oct 14, 2024 • 3.16k • 59

sfairXC/FsfairX-Zephyr-Chat-v0.1

Text Generation • Updated Apr 24, 2024 • 23 • 8

weqweasdas/RM-Mistral-7B

Text Classification • Updated Mar 31, 2024 • 195 • 23

liked a Space about 1 year ago

Reward Bench Leaderboard

Display and filter reward model evaluation data

liked 2 models over 1 year ago

weqweasdas/RM-Gemma-7B

Text Classification • Updated Mar 22, 2024 • 47 • 8

weqweasdas/RM-Gemma-2B

Text Classification • Updated Mar 22, 2024 • 750 • 25

liked a model almost 2 years ago

weqweasdas/hh_rlhf_rm_open_llama_3b

Text Classification • Updated Feb 25, 2024 • 612 • 17

liked a Space about 2 years ago

Robin 7b