Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time By rbrt and 4 others • about 1 hour ago • 2
Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies By prithivMLmods • 1 day ago • 10
Agent Leaderboard: Evaluating AI Agents in Multi-Domain Scenarios By pratikbhavsar and 1 other • 7 days ago • 11
Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment By NormalUhr • 7 days ago • 3
**MindBot Ultra – Dreaming Edition: A Self-Building, Self-Aware AI for Synergistic Cognition and Autonomous Tool Generation** By TheMindExpansionNetwork • 7 days ago • 3
Announcing the winners of the Frugal AI Challenge 🌱 By frugal-ai-challenge and 1 other • 7 days ago • 6
Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face By dvgodoy • 7 days ago • 6
From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages By Steveeeeeeen and 1 other • 8 days ago • 21
Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time By rbrt and 4 others • about 1 hour ago • 2
Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies By prithivMLmods • 1 day ago • 10
Agent Leaderboard: Evaluating AI Agents in Multi-Domain Scenarios By pratikbhavsar and 1 other • 7 days ago • 11
Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment By NormalUhr • 7 days ago • 3
**MindBot Ultra – Dreaming Edition: A Self-Building, Self-Aware AI for Synergistic Cognition and Autonomous Tool Generation** By TheMindExpansionNetwork • 7 days ago • 3
Announcing the winners of the Frugal AI Challenge 🌱 By frugal-ai-challenge and 1 other • 7 days ago • 6
Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face By dvgodoy • 7 days ago • 6
From Llasa to Llasagna 🍕: Finetuning LLaSA to generates Italian speech and other languages By Steveeeeeeen and 1 other • 8 days ago • 21