Zhihe Yang
zhyang2226
AI & ML interests
Trustworthy RL & Offline RL
Recent Activity
liked
a model
about 2 months ago
tencent/HunyuanVideo
authored
a paper
about 2 months ago
Mitigating Hallucinations in Large Vision-Language Models via DPO:
On-Policy Data Hold the Key
authored
a paper
about 2 months ago
Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs