Ksenia Se

Kseniase

AI & ML interests

None yet

Recent Activity

replied to their post 4 days ago
9 Multimodal Chain-of-Thought methods How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer. Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source: 1. KAM-CoT -> https://huggingface.co/papers/2401.12863 This lightweight framework combines CoT prompting with knowledge graphs (KGs) and achieves 93.87% accuracy 2. Multimodal Visualization-of-Thought (MVoT) -> https://huggingface.co/papers/2501.07542 Lets models generate visual reasoning traces, using a token discrepancy loss to improve visual quality 3. Compositional CoT (CCoT) -> https://huggingface.co/papers/2311.17076 Uses scene graph (SG) representations generated by the LMM itself to improve performance on compositional and general multimodal benchmarks 4. URSA -> https://huggingface.co/papers/2501.04686 Brings System 2-style thinking to multimodal math reasoning, using a 3-module CoT data synthesis process with CoT distillation, trajectory-format rewriting and format unification 5. MM-Verify -> https://huggingface.co/papers/2502.13383 Introduces a verification mechanism with MM-Verifier and MM-Reasoner that implements synthesized high-quality CoT data for multimodal reasoning 6. Duty-Distinct CoT (DDCoT) -> https://huggingface.co/papers/2310.16436 Divides the reasoning responsibilities between LMs and visual models, integrating the visual recognition capabilities into the joint reasoning process 7. Multimodal-CoT from Amazon Web Services -> https://huggingface.co/papers/2302.00923 A two-stage framework separates rationale generation from answer prediction, allowing the model to reason more effectively using multimodal inputs 8. Graph-of-Thought (GoT) -> https://huggingface.co/papers/2305.16582 This two-stage framework models reasoning as a graph of interconnected ideas, improving performance on text-only and multimodal tasks More in the comments👇
posted an update 4 days ago
9 Multimodal Chain-of-Thought methods How Chain-of-Thought (CoT) prompting can unlock models' full potential across images, video, audio and more? Finding special multimodal CoT techniques is the answer. Here are 9 methods of Multimodal Chain-of-Thought (MCoT). Most of them are open-source: 1. KAM-CoT -> https://huggingface.co/papers/2401.12863 This lightweight framework combines CoT prompting with knowledge graphs (KGs) and achieves 93.87% accuracy 2. Multimodal Visualization-of-Thought (MVoT) -> https://huggingface.co/papers/2501.07542 Lets models generate visual reasoning traces, using a token discrepancy loss to improve visual quality 3. Compositional CoT (CCoT) -> https://huggingface.co/papers/2311.17076 Uses scene graph (SG) representations generated by the LMM itself to improve performance on compositional and general multimodal benchmarks 4. URSA -> https://huggingface.co/papers/2501.04686 Brings System 2-style thinking to multimodal math reasoning, using a 3-module CoT data synthesis process with CoT distillation, trajectory-format rewriting and format unification 5. MM-Verify -> https://huggingface.co/papers/2502.13383 Introduces a verification mechanism with MM-Verifier and MM-Reasoner that implements synthesized high-quality CoT data for multimodal reasoning 6. Duty-Distinct CoT (DDCoT) -> https://huggingface.co/papers/2310.16436 Divides the reasoning responsibilities between LMs and visual models, integrating the visual recognition capabilities into the joint reasoning process 7. Multimodal-CoT from Amazon Web Services -> https://huggingface.co/papers/2302.00923 A two-stage framework separates rationale generation from answer prediction, allowing the model to reason more effectively using multimodal inputs 8. Graph-of-Thought (GoT) -> https://huggingface.co/papers/2305.16582 This two-stage framework models reasoning as a graph of interconnected ideas, improving performance on text-only and multimodal tasks More in the comments👇
View all activity

Organizations

Turing Post's profile picture Journalists on Hugging Face's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Sandbox's profile picture

Kseniase's activity

published an article 14 days ago
view article
Article

What is Qwen-Agent framework? Inside the Qwen family

By Kseniase and 1 other
8
published an article 15 days ago
view article
Article

🌁#92: Fight for Developers and the Year of Orchestration

By Kseniase
5
published an article 17 days ago
view article
Article

🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?

By Kseniase
117
published an article 20 days ago
view article
Article

How to Reduce Memory Use in Reasoning Models

By Kseniase and 1 other
12
published an article 23 days ago
view article
Article

🌁#91: We are failing in AI literacy

By Kseniase and 1 other
3
published an article 23 days ago
view article
Article

🌁#90: Why AI’s Reasoning Tests Keep Failing Us

By Kseniase
9
published an article 24 days ago
view article
Article

🦸🏻#13: Action! How AI Agents Execute Tasks with UI and API Tools

By Kseniase
8
published an article 25 days ago
view article
Article

🦸🏻#12: How Do Agents Learn from Their Own Mistakes? The Role of Reflection in AI

By Kseniase
6
published an article 27 days ago
view article
Article

Everything You Need to Know about Knowledge Distillation

By Kseniase and 1 other
21
published an article about 1 month ago
published an article about 1 month ago
view article
Article

🌁#89: AI in Action: How AI Engineers, Self-Optimizing Models, and Humanoid Robots Are Reshaping 2025

By Kseniase
4
published an article about 1 month ago
published an article about 1 month ago
published an article about 1 month ago
published an article about 2 months ago
published an article about 2 months ago
view article
Article

Topic 27: What are Chain-of-Agents and Chain-of-RAG?

By Kseniase and 1 other
13
published an article about 2 months ago
published an article about 2 months ago
view article
Article

What is test-time compute and how to scale it?

By Kseniase and 1 other
66