Ksenia Se
AI & ML interests
Recent Activity
Organizations
Kseniase's activity


CoT-RAG -> https://huggingface.co/papers/2504.13534
Adds 3 new designs to CoT approach: 1) Knowledge Graph-driven CoT Generation to guide reasoning chains, 2) Learnable Knowledge Case-aware RAG for combining RAG with knowledge graphs to provide relevant sub-cases, and 3) Logic-based pseudo-program prompting execution.Unsupervised Visual CoT (UV-CoT) -> https://huggingface.co/papers/2504.18397
Performs preference comparisons between model-generated bounding boxes. It generates and ranks model responses to visual regions, using this feedback to guide training to improve image-level reasoning.CoTAL -> https://huggingface.co/papers/2504.02323
Combines CoT with active learning, using curriculum-aligned assessments, human-in-the-loop prompt design, and teacher/student feedback to improve automated grading. It boosts GPT-4βs accuracy by up to 24.5%.Deconstructing Long CoT (DLCoT) -> https://huggingface.co/papers/2503.16385
Enhances distillation data by segmenting data, simplifying solutions, and optimizing of intermediate error states, improving model performance and token efficiency.

CoT has long been one of the hottest techniques in AI thanks to its effectiveness and compelling core idea: encouraging models to solve complex problems through explicit intermediate reasoning steps. But usually researchers modify original CoT approach, finding tips that further improve LLMs' reasoning. That's what we're going to talk about today.
Here's a list of 10 latest enhanced CoT approaches:
1. Chain-of-Defensive-Thought -> Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption (2504.20769)
Provides a few structured, defensive reasoning exemplars to improve the robustness of LLMs
2. Hybrid-CoT -> AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization (2504.21659)
Proposes using Adaptive Hybrid Reasoning Model (AdaR1) that combines Long- and Short-CoT, and applying bi-level preference training to select effective reasoning styles
3. Semantic-level and token-level CoT -> T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT (2505.00703)
Introduces T2I-R1 text-to-image gen model, that uses semantic-level CoT for prompt planning and token-level CoT for pixel-level generation, while BiCoT-GRPO coordinates them both
4. Speculative CoT (SCoT) -> Efficient Reasoning for LLMs through Speculative Chain-of-Thought (2504.19095)
SCoT drafts multiple reasoning paths with a lightweight draft, selects the best, and uses the target model for correction - all this to reduce latency by 48β66%
5. Collaborative CoT (Co-CoT) -> Co-CoT: A Prompt-Based Framework for Collaborative Chain-of-Thought Reasoning (2504.17091)
Breaks reasoning into blocks that users can inspect, modify and re-run, promoting active engagement. An adaptation mechanism aligns outputs with diverse cognitive styles and user goals
6. XS-CoT -> Enhancing Non-Core Language Instruction-Following in Speech LLMs via Semi-Implicit Cross-Lingual CoT Reasoning (2504.20835)
It's a cross-lingual framework that integrates speech-to-text translation into reasoning, using a semi-implicit CoT approach to compress intermediate tokens. This improves non-core language responses by up to 45%
Read further in the comments π
If you liked this, also subscribe to the Turing Post -> https://www.turingpost.com/subscribe

great idea

A2A overview coming soon!

It seems to work for me, check https://www.youtube.com/watch?v=kQmXtrmQ5Zg

So far I think it's mostly for integrating tools. A2A is about agents and their communication. I will post a detailed overview of A2A here on Hugging face soon

Tiny Agents: a MCP-powered agent in 50 lines of code

What is MoE 2.0? Update Your Knowledge about Mixture-of-experts

What is MoE 2.0? Update Your Knowledge about Mixture-of-experts

RL now is where the real action is, it's the engine behind autonomous tech, robots, and the next wave of AI that thinks, moves and solves problems on its own. To stay up to date with whatβs happening in RL, we offer some fresh materials on it:
1. "Reinforcement Learning from Human Feedback" by Nathan Lambert -> https://rlhfbook.com/
It's a short introduction to RLHF, explaining instruction tuning, reward modeling, alignment methods, synthetic data, evaluation, and more
2. "A Course in Reinforcement Learning (2nd Edition)" by Dimitri P. Bertsekas -> https://www.mit.edu/~dimitrib/RLbook.html
Explains dynamic programming (DP) and RL, diving into rollout algorithms, neural networks, policy learning, etc. Itβs packed with solved exercises and real-world examples
3. "Mathematical Foundations of Reinforcement Learning" video course by Shiyu Zhao -> https://www.youtube.com/playlist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8
Offers a mathematical yet friendly introduction to RL, covering Bellman Equation, value iteration, Monte Carlo learning, approximation, policy gradient, actor-critic methods, etc.
+ Check out the repo for more: https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning
4. "Multi-Agent Reinforcement Learning" by Stefano V. Albrecht, Filippos Christianos, and Lukas SchΓ€fer -> https://www.marl-book.com/
Covers models, core ideas of multi-agent RL (MARL) and modern approaches to combining it with deep learning
5. "Reinforcement Learning: A Comprehensive Overview" by Kevin P. Murphy -> https://arxiv.org/pdf/2412.05265
Explains RL and sequential decision making, covering value-based, policy-gradient, model-based, multi-agent RL methods, RL+LLMs, and RL+inference and other topics
6. Our collection of free courses and books on RL -> https://huggingface.co/posts/Kseniase/884818121094439
If you liked this, also subscribe to The Turing Post: https://www.turingpost.com/subscribe

These are graph-centric types of RAG:
NodeRAG -> https://huggingface.co/papers/2504.11544
Uses well-designed heterogeneous graph structures and focuses on graph design to ensure smooth integration of graph algorithms. It outperforms GraphRAG and LightRAG on multi-hop and open-ended QA benchmarksHeteRAG -> https://huggingface.co/papers/2504.10529
This heterogeneous RAG framework decouples knowledge chunk representations. It uses multi-granular views for retrieval and concise chunks for generation, along with adaptive prompt tuningHyper-RAG -> https://huggingface.co/papers/2504.08758
A hypergraph-based RAG method. By capturing both pairwise and complex relationships in domain-specific knowledge, it improves factual accuracy and reduces hallucinations, especially in high-stakes fields like medicine, surpassing Graph RAG and Light RAG. Its lightweight version also doubles retrieval speed

RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs.
Here are 11 latest RAG types:
1. InstructRAG -> InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning (2504.13032)
Combines RAG with a multi-agent framework, using a graph-based structure, an RL agent to expand task coverage, and a meta-learning agent for better generalization
2. CoRAG (Collaborative RAG) -> CoRAG: Collaborative Retrieval-Augmented Generation (2504.01883)
A collaborative framework that extends RAG to settings where clients train a shared model using a joint passage store
3. ReaRAG -> ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation (2503.21729)
It uses a Thought-Action-Observation loop to decide at each step whether to retrieve information or finalize an answer, reducing unnecessary reasoning and errors
4. MCTS-RAG -> MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search (2503.20757)
Combines RAG with Monte Carlo Tree Search (MCTS) to help small LMs handle complex, knowledge-heavy tasks
5. Typed-RAG - > Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering (2503.15879)
Improves answers on open-ended questions by identifying question types (a debate, personal experience, or comparison) and breaking it down into simpler parts
6. MADAM-RAG -> Retrieval-Augmented Generation with Conflicting Evidence (2504.13079)
A multi-agent system where models debate answers over multiple rounds and an aggregator filters noise and misinformation
7. HM-RAG -> HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation (2504.12330)
A hierarchical multi-agent RAG framework that uses 3 agents: one to split queries, one to retrieve across multiple data types (text, graphs and web), and one to merge and refine answers
8. CDF-RAG -> CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation (2504.12560)
Works with causal graphs and enables multi-hop causal reasoning, refining queries. It validates responses against causal pathways
To explore what is Causal AI, read our article: https://www.turingpost.com/p/causalai
Subscribe to the Turing Post: https://www.turingpost.com/subscribe
Read further π

Picture this: You ask Claude about a topic, and it instantly pulls verified and trusted NYT content β no more guessing if the info is accurate.
The cool part? Publishers stay in control of what they share via API, and users get fast, reliable access through the AI tools they already use. Instead of scraping random stuff off the web, we get a future where publishers actively shape how their journalism shows up in AI.
Itβs still a bit technical to set up right now, but this could get super simple soon β like installing apps on your phone, but for your chatbot. And you keep the brand connection, too.
Not saying it solves everything, but itβs definitely a new way to distribute content β and maybe even find some fresh value in the middle of this whole news + AI shakeup. Early movers will have a head start.
Curious what folks think β could MCPs be a real opportunity for journalism?
For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models.
So here are 13 new methods + 3 comprehensive studies on test-time scaling:
1. Inference-Time Scaling for Generalist Reward Modeling (2504.02495)
Probably, the most popular study. It proposes to boost inference-time scalability by improving reward modeling. To enhance performance, DeepSeek-GRM uses adaptive critiques, parallel sampling, pointwise generative RM, and Self-Principled Critique Tuning (SPCT)
2. T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models (2504.04718)
Allows small models to use external tools, like code interpreters and calculator, to enhance self-verification
3. Z1: Efficient Test-time Scaling with Code (2504.00810)
Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window
4. GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning (2504.00891)
Introduces GenPRM, a generative PRM, that uses CoT reasoning and code verification for step-by-step judgment. With only 23K training examples, GenPRM outperforms prior PRMs and larger models
5. Can Test-Time Scaling Improve World Foundation Model? (2503.24320)
SWIFT test-time scaling framework improves World Models' performance without retraining, using strategies like fast tokenization, Top-K pruning, and efficient beam search
6. Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking (2504.07104)
Proposes REBEL for RAG systems scaling, which uses multi-criteria optimization with CoT prompting for better performance-speed tradeoffs as inference compute increases
7. $Ο$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation (2503.13288)
Proposes a Ο-Decoding strategy that uses foresight sampling, clustering and adaptive pruning to estimate and select optimal reasoning steps
Read further below π
Also, subscribe to the Turing Post https://www.turingpost.com/subscribe

Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing -> https://huggingface.co/papers/2503.19385
An effective test-time scaling method for flow models with SDE-based generation for particle sampling, interpolant conversion to enhance diversity, and Rollover Budget Forcing (RBF) for adaptive compute allocationDedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks -> https://huggingface.co/papers/2503.04378
Introduces a Feedback-Edit model setup that improves inference-time scaling, particularly for open-ended tasks, by using 3 different model for drafting, feedback and editingm1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models -> https://huggingface.co/papers/2504.00869
A simple m1 method improves medical performance at inference, with models under 10B outperforming previous benchmarks and a 32B model matching 70B modelsToolACE-R: Tool Learning with Adaptive Self-Refinement -> https://huggingface.co/papers/2504.01400
ToolACE-R enables adaptive self-refinement of tool use through model-aware iterative training. It refines tool calls without external feedback and scales inference compute efficientlyScaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding -> https://huggingface.co/papers/2504.01281
Introduces a lightweight RAG framework that uses PORAG for better content use, ATLAS for adaptive retrieval timing, and CRITIC for efficient memory use. Together with optimized decoding strategies and adaptive reasoning depth, it allows the model to scale its inference steps effectively.Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute -> https://huggingface.co/papers/2504.00762
ModelSwitch is a sampling-then-voting strategy that uses multiple models (including weaker ones) to leverage diverse strengths, where a consistency signal guides dynamic model switching. It highlights the potential of multi-model generation-verification.
3 comprehensive surveys on inference time-scaling:
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead -> https://huggingface.co/papers/2504.00294
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models -> https://huggingface.co/papers/2503.24235
Efficient Inference for Large Reasoning Models: A Survey -> https://huggingface.co/papers/2503.23077

For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models.
So here are 13 new methods + 3 comprehensive studies on test-time scaling:
1. Inference-Time Scaling for Generalist Reward Modeling (2504.02495)
Probably, the most popular study. It proposes to boost inference-time scalability by improving reward modeling. To enhance performance, DeepSeek-GRM uses adaptive critiques, parallel sampling, pointwise generative RM, and Self-Principled Critique Tuning (SPCT)
2. T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models (2504.04718)
Allows small models to use external tools, like code interpreters and calculator, to enhance self-verification
3. Z1: Efficient Test-time Scaling with Code (2504.00810)
Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window
4. GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning (2504.00891)
Introduces GenPRM, a generative PRM, that uses CoT reasoning and code verification for step-by-step judgment. With only 23K training examples, GenPRM outperforms prior PRMs and larger models
5. Can Test-Time Scaling Improve World Foundation Model? (2503.24320)
SWIFT test-time scaling framework improves World Models' performance without retraining, using strategies like fast tokenization, Top-K pruning, and efficient beam search
6. Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking (2504.07104)
Proposes REBEL for RAG systems scaling, which uses multi-criteria optimization with CoT prompting for better performance-speed tradeoffs as inference compute increases
7. $Ο$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation (2503.13288)
Proposes a Ο-Decoding strategy that uses foresight sampling, clustering and adaptive pruning to estimate and select optimal reasoning steps
Read further below π
Also, subscribe to the Turing Post https://www.turingpost.com/subscribe
