-
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 14 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 39 -
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 23
Abhranil Chandra
abhranil14
AI & ML interests
Reinforcement Learning, Deep Unsupervised Learning, NLP and Bayesian Deep Learning
Recent Activity
upvoted
a
collection
3 days ago
🧠Reasoning datasets
authored
a paper
8 days ago
VideoAgent: Self-Improving Video Generation
authored
a paper
8 days ago
Leveraging recent advances in Pre-Trained Language Models
forEye-Tracking Prediction
Organizations
Collections
8
models
24
abhranil14/llama_on_human_gold_7500_FF_batch256_lr10e-6_wr0.1
Updated
abhranil14/llama_on_wrong_soln_wrt_human_1_soln_per_qs_6076_FF_batch64_lr10e-6_warmup100
Updated
abhranil14/Gemma_FF_on_gemma_gold_6319_FF_batch64_lr10e-6_warmup100
Updated
abhranil14/Gemma_FF_on_gemma_gold_6319_FF_batch256_lr10e-6_warmup100
Updated
abhranil14/Qwen_wrong_soln_wrt_human_1_soln_per_qs_6076_PEFT_batch256_lr10e-6_warmup100
Updated
abhranil14/Qwen_wrong_soln_wrt_human_1_soln_per_qs_6076_PEFT_batch64_lr10e-6_warmup100
Updated
abhranil14/llama_on_wrong_soln_wrt_human_1_soln_per_qs_6076_FF_batch256_lr10e-6_warmup100
Updated
abhranil14/Math_gemma9b_ver_gen_75_25_full_finetune
Updated
abhranil14/llama3.1_8B_gemma_gold_batch_256
Updated
abhranil14/llama3.1_8B_human_gold_batch_256
Updated