yuchang

hiyuchang

AI & ML interests

None yet

Recent Activity

upvoted a paper about 13 hours ago

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

upvoted a paper 3 months ago

Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models

upvoted an article 6 months ago

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

View all activity

Organizations

None yet

upvoted a paper about 13 hours ago

On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting

Paper • 2508.11408 • Published 7 days ago • 6

upvoted a paper 3 months ago

Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models

Paper • 2505.17826 • Published May 23 • 9

upvoted an article 6 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

•

Feb 7

• 209

liked a model 10 months ago

RedHatAI/Llama-2-7b-gsm8k

Text Generation • Updated Jun 20, 2024 • 4.43k • 5

liked 2 datasets 12 months ago

allenai/c4

Viewer • Updated Jan 9, 2024 • 10.4B • 365k • 456

openai/gsm8k

Viewer • Updated Jan 4, 2024 • 17.6k • 389k • 840

liked a model 12 months ago

openai-community/gpt2

Text Generation • 0.1B • Updated Feb 19, 2024 • 11.4M • 2.9k

liked a dataset over 1 year ago

tau/commonsense_qa

Viewer • Updated Jan 4, 2024 • 12.1k • 69.9k • 110

liked a model over 1 year ago

openai-community/gpt2-xl

Text Generation • 2B • Updated Feb 19, 2024 • 243k • 354

liked a Space over 1 year ago

Calculate Model Flops

🔥

Calculate FLOPs and Parameters for models

liked 2 datasets over 1 year ago

jppgks/twitter-financial-news-sentiment

Viewer • Updated Sep 13, 2023 • 11.9k • 1.91k • 3

zeroshot/twitter-financial-news-topic

Viewer • Updated Feb 23, 2024 • 21.1k • 5.58k • 40

liked 4 models over 1 year ago

liked a dataset over 1 year ago

SetFit/20_newsgroups

Viewer • Updated Feb 3, 2022 • 18.8k • 4.56k • 15

liked 3 models over 1 year ago

distilbert/distilbert-base-uncased

Fill-Mask • 0.1B • Updated May 6, 2024 • 12.4M • • 747

distilbert/distilbert-base-multilingual-cased

Fill-Mask • 0.1B • Updated May 6, 2024 • 983k • 210

FacebookAI/roberta-base

Fill-Mask • 0.1B • Updated Feb 19, 2024 • 10.9M • • 515

yuchang

AI & ML interests

Recent Activity

Organizations

hiyuchang's activity

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Calculate Model Flops