4 27 11

SARIM HASHMI

Sarim-Hash

AI & ML interests

None yet

Recent Activity

upvoted an article 3 days ago

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

updated a model 24 days ago

Sarim-Hash/Qwen3-14B-sandbagging

published a model 24 days ago

Sarim-Hash/Qwen3-14B-sandbagging

View all activity

Organizations

upvoted an article 3 days ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

Feb 11, 2025

•

116

updated a model 24 days ago

Sarim-Hash/Qwen3-14B-sandbagging

Text Generation • 425k • Updated 24 days ago • 507

published a model 24 days ago

Sarim-Hash/Qwen3-14B-sandbagging

Text Generation • 425k • Updated 24 days ago • 507

published a model 29 days ago

Sarim-Hash/Qwen3-8B-sandbagging

Updated 29 days ago

upvoted a paper about 1 month ago

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

Paper • 2602.20161 • Published Feb 23 • 23

liked a model about 2 months ago

jhaaprince/qwen3-4b-sandbag-gsm8k-reverse

Updated Feb 18 • 1

updated a dataset about 2 months ago

Sarim-Hash/ICML-Debate-Arena

Updated Feb 11 • 6

published a dataset about 2 months ago

Sarim-Hash/ICML-Debate-Arena

Updated Feb 11 • 6

liked a model 2 months ago

mistralai/Voxtral-Mini-4B-Realtime-2602

Automatic Speech Recognition • 4B • Updated 29 days ago • 890k • 800

updated a model 2 months ago

Sarim-Hash/eba-qwen2p5-72b-adapter

Updated Jan 24 • 2

published a model 2 months ago

Sarim-Hash/eba-qwen2p5-72b-adapter

Updated Jan 24 • 2

updated a model 2 months ago

Sarim-Hash/qwen-235-v1

Updated Jan 24 • 5

published a model 3 months ago

Sarim-Hash/qwen-235-v1

Updated Jan 24 • 5

authored a paper 4 months ago

Robust and Calibrated Detection of Authentic Multimedia Content

Paper • 2512.15182 • Published Dec 17, 2025 • 17

upvoted a paper 4 months ago

Robust and Calibrated Detection of Authentic Multimedia Content

Paper • 2512.15182 • Published Dec 17, 2025 • 17

submitted a paper to Daily Papers 4 months ago

Robust and Calibrated Detection of Authentic Multimedia Content

Paper • 2512.15182 • Published Dec 17, 2025 • 17

upvoted an article 4 months ago

Article

Diffusers welcomes FLUX-2

Nov 25, 2025

•

189

upvoted a paper 5 months ago

EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning

Paper • 2509.22576 • Published Sep 26, 2025 • 137

upvoted 2 papers 6 months ago

olmOCR 2: Unit Test Rewards for Document OCR

Paper • 2510.19817 • Published Oct 22, 2025 • 16

World-in-World: World Models in a Closed-Loop World

Paper • 2510.18135 • Published Oct 20, 2025 • 78

SARIM HASHMI

AI & ML interests

Recent Activity

Organizations

Sarim-Hash's activity

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

Diffusers welcomes FLUX-2