Yury Panikov's picture

Yury Panikov

panikov

·

panikov

AI & ML interests

None yet

Recent Activity

upvoted a paper 11 days ago

Efficient Agents: Building Effective Agents While Reducing Cost

commented on a paper 15 days ago

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

upvoted a paper 15 days ago

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

View all activity

Organizations

None yet

upvoted a paper 11 days ago

Efficient Agents: Building Effective Agents While Reducing Cost

Paper • 2508.02694 • Published 29 days ago • 81

upvoted a paper 15 days ago

Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction

Paper • 2508.03613 • Published 17 days ago • 11

upvoted 4 papers 16 days ago

The Promise of RL for Autoregressive Image Editing

Paper • 2508.01119 • Published 21 days ago • 11

CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search

Paper • 2508.02091 • Published 19 days ago • 13

Tool-integrated Reinforcement Learning for Repo Deep Search

Paper • 2508.03012 • Published 18 days ago • 18

Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference

Paper • 2508.02193 • Published 19 days ago • 126

upvoted 2 papers 18 days ago

Trainable Dynamic Mask Sparse Attention

Paper • 2508.02124 • Published 19 days ago • 15

MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks

Paper • 2507.19634 • Published 28 days ago • 9

upvoted 5 papers 21 days ago

EDGE-GRPO: Entropy-Driven GRPO with Guided Error Correction for Advantage Diversity

Paper • 2507.21848 • Published 24 days ago • 7

Flow Equivariant Recurrent Neural Networks

Paper • 2507.14793 • Published Jul 20 • 2

Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding

Paper • 2507.19427 • Published 28 days ago • 18

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Paper • 2507.14111 • Published Jul 18 • 22

MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published 27 days ago • 13

upvoted 7 papers 24 days ago

Diversity-Enhanced Reasoning for Subjective Questions

Paper • 2507.20187 • Published 26 days ago • 23

Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning

Paper • 2507.21049 • Published 25 days ago • 40

Goal Alignment in LLM-Based User Simulators for Conversational AI

Paper • 2507.20152 • Published 27 days ago • 4

Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty

Paper • 2507.16806 • Published Jul 22 • 6

UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing Large Language Models' Reasoning Abilities

Paper • 2507.19766 • Published 28 days ago • 14

Geometric-Mean Policy Optimization

Paper • 2507.20673 • Published 25 days ago • 31

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published 28 days ago • 141