-
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 14 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 24
Abhranil Chandra
abhranil14
AI & ML interests
Reinforcement Learning, Deep Unsupervised Learning, NLP and Bayesian Deep Learning
Recent Activity
updated
a dataset
2 days ago
MilaAI4Math/Final_Datasets_Preprocessed_for_Reasoning_from_Wrong_CoTs
updated
a dataset
11 days ago
MilaAI4Math/Final_GSM8k_Dataset
published
a dataset
11 days ago
MilaAI4Math/Final_GSM8k_Dataset