Learning Smooth Reward Models with Temporal Difference for LLM RL and Inference
Dan Zhang
zd21
AI & ML interests
None yet
Recent Activity
published
a dataset
15 days ago
zd21/TDRM-1-step-TD
updated
a dataset
15 days ago
zd21/TDRM-3-step-TD
published
a dataset
15 days ago
zd21/TDRM-3-step-TD
Organizations
None yet