Emin Temiz
PRO
etemiz
·
AI & ML interests
Alignment
Recent Activity
reacted
to
Kseniase's
post
with ❤️
7 days ago
6 Free resources on Reinforcement Learning (RL)
RL now is where the real action is, it's the engine behind autonomous tech, robots, and the next wave of AI that thinks, moves and solves problems on its own. To stay up to date with what’s happening in RL, we offer some fresh materials on it:
1. "Reinforcement Learning from Human Feedback" by Nathan Lambert -> https://rlhfbook.com/
It's a short introduction to RLHF, explaining instruction tuning, reward modeling, alignment methods, synthetic data, evaluation, and more
2. "A Course in Reinforcement Learning (2nd Edition)" by Dimitri P. Bertsekas -> https://www.mit.edu/~dimitrib/RLbook.html
Explains dynamic programming (DP) and RL, diving into rollout algorithms, neural networks, policy learning, etc. It’s packed with solved exercises and real-world examples
3. "Mathematical Foundations of Reinforcement Learning" video course by Shiyu Zhao -> https://www.youtube.com/playlist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8
Offers a mathematical yet friendly introduction to RL, covering Bellman Equation, value iteration, Monte Carlo learning, approximation, policy gradient, actor-critic methods, etc.
+ Check out the repo for more: https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning
4. "Multi-Agent Reinforcement Learning" by Stefano V. Albrecht, Filippos Christianos, and Lukas Schäfer -> https://www.marl-book.com/
Covers models, core ideas of multi-agent RL (MARL) and modern approaches to combining it with deep learning
5. "Reinforcement Learning: A Comprehensive Overview" by Kevin P. Murphy -> https://arxiv.org/pdf/2412.05265
Explains RL and sequential decision making, covering value-based, policy-gradient, model-based, multi-agent RL methods, RL+LLMs, and RL+inference and other topics
6. Our collection of free courses and books on RL -> https://huggingface.co/posts/Kseniase/884818121094439
If you liked this, also subscribe to The Turing Post: https://www.turingpost.com/subscribe
View all activity
Organizations
None yet
view post
Qwen 3 numbers are in! They did a good job this time, compared to 2.5 and QwQ numbers are a lot better. I used 2 GGUFs for this, one from LMStudio and one from Unsloth. Number of parameters: 235B A22B. The first one is Q4. Second one is Q8.The LLMs that did the comparison are the same, Llama 3.1 70B and Gemma 3 27B. So I took 2*2 = 4 measurements for each column and took average of measurements.My leaderboard is pretty unrelated to others it seems. Valuable in that sense, it is another non-mainstream angle for model evaluation.More info: https://huggingface.co/blog/etemiz/aha-leaderboard
Benchmarking Human Alignment of Grok 3