RLRM

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

DongfuJiang authored a paper about 1 month ago

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

DongfuJiang authored a paper about 1 month ago

QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design

DongfuJiang authored a paper about 1 month ago

General-Reasoner: Advancing LLM Reasoning Across All Domains

View all activity

DongfuJiang

authored 3 papers about 1 month ago

yuchenlin

authored a paper 3 months ago

CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation

Paper • 2504.00043 • Published Mar 30 • 9

DongfuJiang

updated a model 3 months ago

RLRM/big_math_rl_pair_ct_7B

8B • Updated Mar 26

DongfuJiang

published a model 3 months ago

RLRM/big_math_rl_pair_ct_7B

8B • Updated Mar 26

DongfuJiang

updated a dataset 4 months ago

RLRM/Big-Math-RL-Verified-CT-7B

Viewer • Updated Mar 14 • 251k • 53

DongfuJiang

published a dataset 4 months ago

RLRM/Big-Math-RL-Verified-CT-7B

Viewer • Updated Mar 14 • 251k • 53

yuchenlin

updated a dataset 4 months ago

RLRM/Big-Math-RL-Verified-CT

Viewer • Updated Mar 14 • 251k • 7

yuchenlin

published a dataset 4 months ago

RLRM/Big-Math-RL-Verified-CT

Viewer • Updated Mar 14 • 251k • 7

yuchenlin

authored a paper 4 months ago

Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published Feb 17 • 38

DongfuJiang

authored a paper 5 months ago

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

Paper • 2502.01718 • Published Feb 3 • 29

yuchenlin

authored a paper 5 months ago

ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning

Paper • 2502.01100 • Published Feb 3 • 17

yuchenlin

authored a paper 8 months ago

On Memorization of Large Language Models in Logical Reasoning

Paper • 2410.23123 • Published Oct 30, 2024 • 18

DongfuJiang

authored a paper 9 months ago

MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks

Paper • 2410.10563 • Published Oct 14, 2024 • 39

yuchenlin

authored 2 papers about 1 year ago

WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs

Paper • 2406.18495 • Published Jun 26, 2024 • 13

MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Paper • 2406.15252 • Published Jun 21, 2024 • 18

DongfuJiang

authored 2 papers about 1 year ago

MantisScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation

Paper • 2406.15252 • Published Jun 21, 2024 • 18

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published Jun 16, 2024 • 14

yuchenlin

authored a paper about 1 year ago

WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences

Paper • 2406.11069 • Published Jun 16, 2024 • 14

AI & ML interests

Recent Activity

Team members 2

RLRM's activity