Dcas89 PRO

Dcas89

AI & ML interests

None yet

Recent Activity

liked a dataset 14 days ago
minwoosun/CholecSeg8k
reacted to Kseniase's post with 🔥 22 days ago
11 Fascinating new Policy Optimization techniques Policy optimization (PO) algorithms are central to training AI models with preference-based feedback. In recent weeks, numerous new PO methods have emerged that build on or replace the popular PPO and GRPO, solving their issues. Here are 11 of them: 1. BAlanced Policy Optimization (BAPO) → https://huggingface.co/papers/2510.18927 Dynamically adjusting the clipping bounds in PPO-style updates to balance positive and negative gradients and prevent entropy collapse 2. Training-Free GRPO → https://huggingface.co/papers/2510.08191 Instead of using numeric rewards, it compares rollouts semantically to distill useful knowledge as a token prior, which is then applied during inference to guide the model’s behavior 3. Asymmetric Importance Sampling Policy Optimization (ASPO) → https://huggingface.co/papers/2510.06062 Fixes imbalanced token weighting in LLM training. It flips the importance sampling ratios for positive tokens to correct over- and under-updates, and adds a soft dual-clipping step to keep gradients stable 4. In-Context Steered Policy Optimization (ICPO) → https://arxiv.org/abs/2510.26519 Uses a model’s own in-context learning ability to guide training with existing data. It combines Mixed-Policy GRPO with Implicit Expert Forcing to expand exploration and adds Expert Region Reject Sampling and Annealed Expert-Bonus Reward Shaping to ensure stability and balanced expert influence 5. Graph-Enhanced Policy Optimization (GEPO) → https://arxiv.org/abs/2510.26270 Builds a graph of an agent’s experiences to understand how different states connect, guide exploration and assign rewards more effectively 6. Information Gain-based Policy Optimization (IGPO) → https://huggingface.co/papers/2510.14967 Uses the model’s own belief updates to create dense, informative feedback for smoother multi-turn learning Read further below ⬇️ If you like this, also subscribe to the Turing post: https://www.turingpost.com/subscribe
reacted to prithivMLmods's post with 👍 about 2 months ago
Try the Hugging Face Space demo for https://huggingface.co/Logics-MLLM/Logics-Parsing, the latest multimodal VLM from the Logics Team at Alibaba Group. It enables end-to-end document parsing with precise content extraction in markdown format, and it also generates a clean HTML representation of the document while preserving its logical structure. 🤗🔥 Additionally, I’ve integrated one of my recent works — https://huggingface.co/prithivMLmods/Gliese-OCR-7B-Post1.0 — which also excels at document comprehension. ⭐ Space / App : https://huggingface.co/spaces/prithivMLmods/VLM-Parsing 📄 Technical Report by the Logics Team, Alibaba Group : https://huggingface.co/papers/2509.19760 🖖 MM: VLM-Parsing: https://huggingface.co/collections/prithivMLmods/mm-vlm-parsing-68e33e52bfb9ae60b50602dc ⚡ Collections : https://huggingface.co/collections/prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0 Other Pages: ➔ Multimodal VLMs - July'25 : https://huggingface.co/collections/prithivMLmods/multimodal-vlms-until-july25-688312e6b840e1e156f13027 ➔ Multimodal VLMs - Aug'25 : https://huggingface.co/collections/prithivMLmods/multimodal-vlms-aug25-68a56aac39fe8084f3c168bd ➔ VL caption — < Sep 15 ’25 : https://huggingface.co/collections/prithivMLmods/vl-caption-sep-15-25-68c7f6d737985c63c13e2391 . . . To know more about it, visit the app page or the respective model page!!
View all activity

Organizations

None yet