panjinhao's picture

7 49

panjinhao

ishaqsaviani

·

ishaqsaviani590

AI & ML interests

NLP,DL,RL,ML

Recent Activity

liked a Space about 1 month ago

deepseek-ai/deepseek-coder-33b-instruct

liked a model about 1 month ago

deepseek-ai/DeepSeek-Prover-V1

liked a model about 1 month ago

deepseek-ai/DeepSeek-Prover-V2-7B

View all activity

Organizations

ishaqsaviani's activity

upvoted a collection about 1 month ago

Gemma 3 QAT

Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated 8 days ago • 195

upvoted an article about 1 month ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

By

and 3 others •

Dec 9, 2022

• 266

upvoted an article 3 months ago

Article

You could have designed state of the art positional encoding

By

•

Nov 25, 2024

• 288

upvoted 3 papers 3 months ago

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Paper • 2408.08152 • Published Aug 15, 2024 • 60

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 190

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Paper • 2501.12948 • Published Jan 22 • 400