Ksenia Se

Kseniase

AI & ML interests

None yet

Recent Activity

replied to their post 1 day ago
10 new Chain-of-Thoughts (CoT) methods CoT has long been one of the hottest techniques in AI thanks to its effectiveness and compelling core idea: encouraging models to solve complex problems through explicit intermediate reasoning steps. But usually researchers modify original CoT approach, finding tips that further improve LLMs' reasoning. That's what we're going to talk about today. Here's a list of 10 latest enhanced CoT approaches: 1. Chain-of-Defensive-Thought -> https://huggingface.co/papers/2504.20769 Provides a few structured, defensive reasoning exemplars to improve the robustness of LLMs 2. Hybrid-CoT -> https://huggingface.co/papers/2504.21659 Proposes using Adaptive Hybrid Reasoning Model (AdaR1) that combines Long- and Short-CoT, and applying bi-level preference training to select effective reasoning styles 3. Semantic-level and token-level CoT -> https://huggingface.co/papers/2505.00703 Introduces T2I-R1 text-to-image gen model, that uses semantic-level CoT for prompt planning and token-level CoT for pixel-level generation, while BiCoT-GRPO coordinates them both 4. Speculative CoT (SCoT) -> https://huggingface.co/papers/2504.19095 SCoT drafts multiple reasoning paths with a lightweight draft, selects the best, and uses the target model for correction - all this to reduce latency by 48–66% 5. Collaborative CoT (Co-CoT) -> https://huggingface.co/papers/2504.17091 Breaks reasoning into blocks that users can inspect, modify and re-run, promoting active engagement. An adaptation mechanism aligns outputs with diverse cognitive styles and user goals 6. XS-CoT -> https://huggingface.co/papers/2504.20835 It's a cross-lingual framework that integrates speech-to-text translation into reasoning, using a semi-implicit CoT approach to compress intermediate tokens. This improves non-core language responses by up to 45% Read further in the comments πŸ‘‡ If you liked this, also subscribe to the Turing Post -> https://www.turingpost.com/subscribe
posted an update 1 day ago
10 new Chain-of-Thoughts (CoT) methods CoT has long been one of the hottest techniques in AI thanks to its effectiveness and compelling core idea: encouraging models to solve complex problems through explicit intermediate reasoning steps. But usually researchers modify original CoT approach, finding tips that further improve LLMs' reasoning. That's what we're going to talk about today. Here's a list of 10 latest enhanced CoT approaches: 1. Chain-of-Defensive-Thought -> https://huggingface.co/papers/2504.20769 Provides a few structured, defensive reasoning exemplars to improve the robustness of LLMs 2. Hybrid-CoT -> https://huggingface.co/papers/2504.21659 Proposes using Adaptive Hybrid Reasoning Model (AdaR1) that combines Long- and Short-CoT, and applying bi-level preference training to select effective reasoning styles 3. Semantic-level and token-level CoT -> https://huggingface.co/papers/2505.00703 Introduces T2I-R1 text-to-image gen model, that uses semantic-level CoT for prompt planning and token-level CoT for pixel-level generation, while BiCoT-GRPO coordinates them both 4. Speculative CoT (SCoT) -> https://huggingface.co/papers/2504.19095 SCoT drafts multiple reasoning paths with a lightweight draft, selects the best, and uses the target model for correction - all this to reduce latency by 48–66% 5. Collaborative CoT (Co-CoT) -> https://huggingface.co/papers/2504.17091 Breaks reasoning into blocks that users can inspect, modify and re-run, promoting active engagement. An adaptation mechanism aligns outputs with diverse cognitive styles and user goals 6. XS-CoT -> https://huggingface.co/papers/2504.20835 It's a cross-lingual framework that integrates speech-to-text translation into reasoning, using a semi-implicit CoT approach to compress intermediate tokens. This improves non-core language responses by up to 45% Read further in the comments πŸ‘‡ If you liked this, also subscribe to the Turing Post -> https://www.turingpost.com/subscribe
View all activity

Organizations

Turing Post's profile picture Journalists on Hugging Face's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Sandbox's profile picture

Kseniase's activity

replied to their post 1 day ago
view reply
  1. CoT-RAG -> https://huggingface.co/papers/2504.13534
    Adds 3 new designs to CoT approach: 1) Knowledge Graph-driven CoT Generation to guide reasoning chains, 2) Learnable Knowledge Case-aware RAG for combining RAG with knowledge graphs to provide relevant sub-cases, and 3) Logic-based pseudo-program prompting execution.

  2. Unsupervised Visual CoT (UV-CoT) -> https://huggingface.co/papers/2504.18397
    Performs preference comparisons between model-generated bounding boxes. It generates and ranks model responses to visual regions, using this feedback to guide training to improve image-level reasoning.

  3. CoTAL -> https://huggingface.co/papers/2504.02323
    Combines CoT with active learning, using curriculum-aligned assessments, human-in-the-loop prompt design, and teacher/student feedback to improve automated grading. It boosts GPT-4’s accuracy by up to 24.5%.

  4. Deconstructing Long CoT (DLCoT) -> https://huggingface.co/papers/2503.16385
    Enhances distillation data by segmenting data, simplifying solutions, and optimizing of intermediate error states, improving model performance and token efficiency.

posted an update 1 day ago
view post
Post
1937
10 new Chain-of-Thoughts (CoT) methods

CoT has long been one of the hottest techniques in AI thanks to its effectiveness and compelling core idea: encouraging models to solve complex problems through explicit intermediate reasoning steps. But usually researchers modify original CoT approach, finding tips that further improve LLMs' reasoning. That's what we're going to talk about today.

Here's a list of 10 latest enhanced CoT approaches:

1. Chain-of-Defensive-Thought -> Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption (2504.20769)
Provides a few structured, defensive reasoning exemplars to improve the robustness of LLMs

2. Hybrid-CoT -> AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization (2504.21659)
Proposes using Adaptive Hybrid Reasoning Model (AdaR1) that combines Long- and Short-CoT, and applying bi-level preference training to select effective reasoning styles

3. Semantic-level and token-level CoT -> T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT (2505.00703)
Introduces T2I-R1 text-to-image gen model, that uses semantic-level CoT for prompt planning and token-level CoT for pixel-level generation, while BiCoT-GRPO coordinates them both

4. Speculative CoT (SCoT) -> Efficient Reasoning for LLMs through Speculative Chain-of-Thought (2504.19095)
SCoT drafts multiple reasoning paths with a lightweight draft, selects the best, and uses the target model for correction - all this to reduce latency by 48–66%

5. Collaborative CoT (Co-CoT) -> Co-CoT: A Prompt-Based Framework for Collaborative Chain-of-Thought Reasoning (2504.17091)
Breaks reasoning into blocks that users can inspect, modify and re-run, promoting active engagement. An adaptation mechanism aligns outputs with diverse cognitive styles and user goals

6. XS-CoT -> Enhancing Non-Core Language Instruction-Following in Speech LLMs via Semi-Implicit Cross-Lingual CoT Reasoning (2504.20835)
It's a cross-lingual framework that integrates speech-to-text translation into reasoning, using a semi-implicit CoT approach to compress intermediate tokens. This improves non-core language responses by up to 45%

Read further in the comments πŸ‘‡

If you liked this, also subscribe to the Turing Post -> https://www.turingpost.com/subscribe
  • 1 reply
Β·
view reply

So far I think it's mostly for integrating tools. A2A is about agents and their communication. I will post a detailed overview of A2A here on Hugging face soon

upvoted an article 7 days ago
view article
Article

Tiny Agents: a MCP-powered agent in 50 lines of code

β€’ 216
upvoted an article 8 days ago
view article
Article

What is MoE 2.0? Update Your Knowledge about Mixture-of-experts

By Kseniase and 1 other β€’
β€’ 5
published an article 8 days ago
view article
Article

What is MoE 2.0? Update Your Knowledge about Mixture-of-experts

By Kseniase and 1 other β€’
β€’ 5
posted an update 9 days ago
view post
Post
6390
6 Free resources on Reinforcement Learning (RL)

RL now is where the real action is, it's the engine behind autonomous tech, robots, and the next wave of AI that thinks, moves and solves problems on its own. To stay up to date with what’s happening in RL, we offer some fresh materials on it:

1. "Reinforcement Learning from Human Feedback" by Nathan Lambert -> https://rlhfbook.com/
It's a short introduction to RLHF, explaining instruction tuning, reward modeling, alignment methods, synthetic data, evaluation, and more

2. "A Course in Reinforcement Learning (2nd Edition)" by Dimitri P. Bertsekas -> https://www.mit.edu/~dimitrib/RLbook.html
Explains dynamic programming (DP) and RL, diving into rollout algorithms, neural networks, policy learning, etc. It’s packed with solved exercises and real-world examples

3. "Mathematical Foundations of Reinforcement Learning" video course by Shiyu Zhao -> https://www.youtube.com/playlist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8
Offers a mathematical yet friendly introduction to RL, covering Bellman Equation, value iteration, Monte Carlo learning, approximation, policy gradient, actor-critic methods, etc.
+ Check out the repo for more: https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning

4. "Multi-Agent Reinforcement Learning" by Stefano V. Albrecht, Filippos Christianos, and Lukas SchΓ€fer -> https://www.marl-book.com/
Covers models, core ideas of multi-agent RL (MARL) and modern approaches to combining it with deep learning

5. "Reinforcement Learning: A Comprehensive Overview" by Kevin P. Murphy -> https://arxiv.org/pdf/2412.05265
Explains RL and sequential decision making, covering value-based, policy-gradient, model-based, multi-agent RL methods, RL+LLMs, and RL+inference and other topics

6. Our collection of free courses and books on RL -> https://huggingface.co/posts/Kseniase/884818121094439

If you liked this, also subscribe to The Turing Post: https://www.turingpost.com/subscribe
replied to their post 16 days ago
view reply

These are graph-centric types of RAG:

  1. NodeRAG -> https://huggingface.co/papers/2504.11544
    Uses well-designed heterogeneous graph structures and focuses on graph design to ensure smooth integration of graph algorithms. It outperforms GraphRAG and LightRAG on multi-hop and open-ended QA benchmarks

  2. HeteRAG -> https://huggingface.co/papers/2504.10529
    This heterogeneous RAG framework decouples knowledge chunk representations. It uses multi-granular views for retrieval and concise chunks for generation, along with adaptive prompt tuning

  3. Hyper-RAG -> https://huggingface.co/papers/2504.08758
    A hypergraph-based RAG method. By capturing both pairwise and complex relationships in domain-specific knowledge, it improves factual accuracy and reduces hallucinations, especially in high-stakes fields like medicine, surpassing Graph RAG and Light RAG. Its lightweight version also doubles retrieval speed

posted an update 16 days ago
view post
Post
6977
11 new types of RAG

RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs.

Here are 11 latest RAG types:

1. InstructRAG -> InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning (2504.13032)
Combines RAG with a multi-agent framework, using a graph-based structure, an RL agent to expand task coverage, and a meta-learning agent for better generalization

2. CoRAG (Collaborative RAG) -> CoRAG: Collaborative Retrieval-Augmented Generation (2504.01883)
A collaborative framework that extends RAG to settings where clients train a shared model using a joint passage store

3. ReaRAG -> ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation (2503.21729)
It uses a Thought-Action-Observation loop to decide at each step whether to retrieve information or finalize an answer, reducing unnecessary reasoning and errors

4. MCTS-RAG -> MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search (2503.20757)
Combines RAG with Monte Carlo Tree Search (MCTS) to help small LMs handle complex, knowledge-heavy tasks

5. Typed-RAG - > Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering (2503.15879)
Improves answers on open-ended questions by identifying question types (a debate, personal experience, or comparison) and breaking it down into simpler parts

6. MADAM-RAG -> Retrieval-Augmented Generation with Conflicting Evidence (2504.13079)
A multi-agent system where models debate answers over multiple rounds and an aggregator filters noise and misinformation

7. HM-RAG -> HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation (2504.12330)
A hierarchical multi-agent RAG framework that uses 3 agents: one to split queries, one to retrieve across multiple data types (text, graphs and web), and one to merge and refine answers

8. CDF-RAG -> CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation (2504.12560)
Works with causal graphs and enables multi-hop causal reasoning, refining queries. It validates responses against causal pathways

To explore what is Causal AI, read our article: https://www.turingpost.com/p/causalai

Subscribe to the Turing Post: https://www.turingpost.com/subscribe

Read further πŸ‘‡
  • 1 reply
Β·
reacted to fdaudens's post with πŸ”₯ 19 days ago
view post
Post
1550
Just tested something this morning that feels kind of game-changing for how we publish, discover, and consume news with AI: connecting Claude directly to the New York Times through MCP.

Picture this: You ask Claude about a topic, and it instantly pulls verified and trusted NYT content β€” no more guessing if the info is accurate.

The cool part? Publishers stay in control of what they share via API, and users get fast, reliable access through the AI tools they already use. Instead of scraping random stuff off the web, we get a future where publishers actively shape how their journalism shows up in AI.

It’s still a bit technical to set up right now, but this could get super simple soon β€” like installing apps on your phone, but for your chatbot. And you keep the brand connection, too.

Not saying it solves everything, but it’s definitely a new way to distribute content β€” and maybe even find some fresh value in the middle of this whole news + AI shakeup. Early movers will have a head start.

Curious what folks think β€” could MCPs be a real opportunity for journalism?
  • 1 reply
Β·
reacted to their post with πŸ‘ 22 days ago
view post
Post
5523
16 new research on inference-time scaling:

For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models.

So here are 13 new methods + 3 comprehensive studies on test-time scaling:

1. Inference-Time Scaling for Generalist Reward Modeling (2504.02495)
Probably, the most popular study. It proposes to boost inference-time scalability by improving reward modeling. To enhance performance, DeepSeek-GRM uses adaptive critiques, parallel sampling, pointwise generative RM, and Self-Principled Critique Tuning (SPCT)

2. T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models (2504.04718)
Allows small models to use external tools, like code interpreters and calculator, to enhance self-verification

3. Z1: Efficient Test-time Scaling with Code (2504.00810)
Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window

4. GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning (2504.00891)
Introduces GenPRM, a generative PRM, that uses CoT reasoning and code verification for step-by-step judgment. With only 23K training examples, GenPRM outperforms prior PRMs and larger models

5. Can Test-Time Scaling Improve World Foundation Model? (2503.24320)
SWIFT test-time scaling framework improves World Models' performance without retraining, using strategies like fast tokenization, Top-K pruning, and efficient beam search

6. Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking (2504.07104)
Proposes REBEL for RAG systems scaling, which uses multi-criteria optimization with CoT prompting for better performance-speed tradeoffs as inference compute increases

7. $Ο†$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation (2503.13288)
Proposes a Ο†-Decoding strategy that uses foresight sampling, clustering and adaptive pruning to estimate and select optimal reasoning steps

Read further below πŸ‘‡

Also, subscribe to the Turing Post https://www.turingpost.com/subscribe
  • 2 replies
Β·
replied to their post 23 days ago
view reply
  1. Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing -> https://huggingface.co/papers/2503.19385
    An effective test-time scaling method for flow models with SDE-based generation for particle sampling, interpolant conversion to enhance diversity, and Rollover Budget Forcing (RBF) for adaptive compute allocation

  2. Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks -> https://huggingface.co/papers/2503.04378
    Introduces a Feedback-Edit model setup that improves inference-time scaling, particularly for open-ended tasks, by using 3 different model for drafting, feedback and editing

  3. m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models -> https://huggingface.co/papers/2504.00869
    A simple m1 method improves medical performance at inference, with models under 10B outperforming previous benchmarks and a 32B model matching 70B models

  4. ToolACE-R: Tool Learning with Adaptive Self-Refinement -> https://huggingface.co/papers/2504.01400
    ToolACE-R enables adaptive self-refinement of tool use through model-aware iterative training. It refines tool calls without external feedback and scales inference compute efficiently

  5. Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding -> https://huggingface.co/papers/2504.01281
    Introduces a lightweight RAG framework that uses PORAG for better content use, ATLAS for adaptive retrieval timing, and CRITIC for efficient memory use. Together with optimized decoding strategies and adaptive reasoning depth, it allows the model to scale its inference steps effectively.

  6. Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute -> https://huggingface.co/papers/2504.00762
    ModelSwitch is a sampling-then-voting strategy that uses multiple models (including weaker ones) to leverage diverse strengths, where a consistency signal guides dynamic model switching. It highlights the potential of multi-model generation-verification.

3 comprehensive surveys on inference time-scaling:

  1. Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead -> https://huggingface.co/papers/2504.00294

  2. What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models -> https://huggingface.co/papers/2503.24235

  3. Efficient Inference for Large Reasoning Models: A Survey -> https://huggingface.co/papers/2503.23077

posted an update 23 days ago
view post
Post
5523
16 new research on inference-time scaling:

For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models.

So here are 13 new methods + 3 comprehensive studies on test-time scaling:

1. Inference-Time Scaling for Generalist Reward Modeling (2504.02495)
Probably, the most popular study. It proposes to boost inference-time scalability by improving reward modeling. To enhance performance, DeepSeek-GRM uses adaptive critiques, parallel sampling, pointwise generative RM, and Self-Principled Critique Tuning (SPCT)

2. T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models (2504.04718)
Allows small models to use external tools, like code interpreters and calculator, to enhance self-verification

3. Z1: Efficient Test-time Scaling with Code (2504.00810)
Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window

4. GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning (2504.00891)
Introduces GenPRM, a generative PRM, that uses CoT reasoning and code verification for step-by-step judgment. With only 23K training examples, GenPRM outperforms prior PRMs and larger models

5. Can Test-Time Scaling Improve World Foundation Model? (2503.24320)
SWIFT test-time scaling framework improves World Models' performance without retraining, using strategies like fast tokenization, Top-K pruning, and efficient beam search

6. Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking (2504.07104)
Proposes REBEL for RAG systems scaling, which uses multi-criteria optimization with CoT prompting for better performance-speed tradeoffs as inference compute increases

7. $Ο†$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation (2503.13288)
Proposes a Ο†-Decoding strategy that uses foresight sampling, clustering and adaptive pruning to estimate and select optimal reasoning steps

Read further below πŸ‘‡

Also, subscribe to the Turing Post https://www.turingpost.com/subscribe
  • 2 replies
Β·
updated a Space 28 days ago