Ksenia Se
AI & ML interests
Recent Activity
Organizations
Kseniase's activity


Topic 28: What is Mixture-of-Mamba?
We've noticed a huge interest in test-time scaling (TTS), so we decided to explore this concept further. Test-time compute (TTC) refers to the amount of computational power used by an AI model when generating a response. Many researchers are now focused on scaling TTC, as it enables slow, deep "thinking" and step-by-step reasoning, which improves overall models' performance.
Here are 8 fresh studies on test-time scaling:
1. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (2502.05171)
Introduces an LM that scales TTC by reasoning in latent space instead of generating more tokens with no special training. Here, a recurrent block to processes information iteratively.
2. Generating Symbolic World Models via Test-time Scaling of Large Language Models (2502.04728)
Shows how TTS is applied to enhance model's Planning Domain Definition Language (PDDL) reasoning capabilities, which can be used to generate a symbolic world model.
3. Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling (2502.06703)
Analyzes optimal TTS strategies and shows how small models can outperform much larger ones.
4. Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis (2502.04128)
Shows how TTS improves expressiveness, timbre consistency and accuracy in speech synthesis with Llasa framework. It also dives into benefits of scaling train-time compute.
5. Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning (2502.07154)
Suggests a modified training loss for better reasoning of LLMs when scaling TTC.
6. Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures (2502.05078)
Unifies the strengths of chain, tree, and graph paradigms into one framework that expands reasoning only on necessary subproblems.
7. Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification (2502.01839)
Explores scaling trends of self-verification and how to improve its capabilities with TTC.
8. CodeMonkeys: Scaling Test-Time Compute for Software Engineering (2501.14723)
Explores how scaling serial compute (iterations) and parallel compute (trajectories), can improve accuracy in real-world software engineering issues.
Also, explore our article about TTS for more -> https://huggingface.co/blog/Kseniase/testtimecompute

π#88: Can DeepSeek Inspire Global Collaboration?

π#88: Can DeepSeek Inspire Global Collaboration?

We've noticed a huge interest in test-time scaling (TTS), so we decided to explore this concept further. Test-time compute (TTC) refers to the amount of computational power used by an AI model when generating a response. Many researchers are now focused on scaling TTC, as it enables slow, deep "thinking" and step-by-step reasoning, which improves overall models' performance.
Here are 8 fresh studies on test-time scaling:
1. Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach (2502.05171)
Introduces an LM that scales TTC by reasoning in latent space instead of generating more tokens with no special training. Here, a recurrent block to processes information iteratively.
2. Generating Symbolic World Models via Test-time Scaling of Large Language Models (2502.04728)
Shows how TTS is applied to enhance model's Planning Domain Definition Language (PDDL) reasoning capabilities, which can be used to generate a symbolic world model.
3. Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling (2502.06703)
Analyzes optimal TTS strategies and shows how small models can outperform much larger ones.
4. Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis (2502.04128)
Shows how TTS improves expressiveness, timbre consistency and accuracy in speech synthesis with Llasa framework. It also dives into benefits of scaling train-time compute.
5. Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning (2502.07154)
Suggests a modified training loss for better reasoning of LLMs when scaling TTC.
6. Adaptive Graph of Thoughts: Test-Time Adaptive Reasoning Unifying Chain, Tree, and Graph Structures (2502.05078)
Unifies the strengths of chain, tree, and graph paradigms into one framework that expands reasoning only on necessary subproblems.
7. Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification (2502.01839)
Explores scaling trends of self-verification and how to improve its capabilities with TTC.
8. CodeMonkeys: Scaling Test-Time Compute for Software Engineering (2501.14723)
Explores how scaling serial compute (iterations) and parallel compute (trajectories), can improve accuracy in real-world software engineering issues.
Also, explore our article about TTS for more -> https://huggingface.co/blog/Kseniase/testtimecompute

π¦Έπ»#10: Does Present-Day GenAI Actually Reason?

π¦Έπ»#10: Does Present-Day GenAI Actually Reason?

Topic 27: What are Chain-of-Agents and Chain-of-RAG?

Topic 27: What are Chain-of-Agents and Chain-of-RAG?
RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.
Here's a list of 8 latest RAG advancements:
1. DeepRAG -> DeepRAG: Thinking to Retrieval Step by Step for Large Language Models (2502.01142)
Models retrieval-augmented reasoning as a Markov Decision Process, enabling strategic retrieval. It dynamically decides when to retrieve external knowledge and when rely on parametric reasoning.
2. RealRAG -> RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning (2502.00848)
EnhancesΒ novel object generation by retrieving real-world images and using self-reflective contrastive learning to fill knowledge gap, improve realism and reduce distortions.
3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342)
Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.
4. VideoRAG -> VideoRAG: Retrieval-Augmented Generation over Video Corpus (2501.05874)
Enables unlimited-length video processing, using dual-channel architecture that integrates graph-based textual grounding and multi-modal context encoding.
5. CFT-RAG ->Β CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter (2501.15098)
A tree-RAG acceleration method uses an improved Cuckoo Filter to optimize entity localization, enabling faster retrieval.
6. Contextualized Graph RAG (CG-RAG) -> CG-RAG: Research Question Answering by Citation Graph Retrieval-Augmented LLMs (2501.15067)
Uses Lexical-Semantic Graph Retrieval (LeSeGR) to integrate sparse and dense signals within graph structure and capture citation relationships
7. GFM-RAG -> GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation (2502.01113)
A graph foundation model that uses a graph neural network to refine query-knowledge connections
8. URAG -> URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots -- A Case Study at HCMUT (2501.16276)
A hybrid system combining rule-based and RAG methods to improve lightweight LLMs for educational chatbots

π#87: Why DeepResearch Should Be Your New Hire

π#87: Why DeepResearch Should Be Your New Hire

Other important RAG advancements:
SafeRAG (a benchmark) -> https://huggingface.co/papers/2501.18636
Establishes a security benchmark revealing how RAG systems are vulnerable to attacks like adversarial data injection, inter-context conflicts, and soft ad poisoning. Evaluates weaknesses in 14 RAG components, emphasizing the need for better filtering and security measures.Topic-FlipRAG: Adversarial Opinion Manipulation -> https://huggingface.co/papers/2502.01386
Demonstrates a two-stage adversarial attack that manipulates RAG-generated opinions on sensitive topics. Alters retrieval rankings and LLM reasoning to subtly flip the stance of generated answers, exposing the difficulty of mitigating semantic-level manipulation.Experiments with LLMs on RAG for Closed-Source Simulation Software -> https://huggingface.co/papers/2502.03916
Tests how RAG can support proprietary software by injecting relevant documentation dynamically. Shows that retrieval helps mitigate hallucinations in closed-source contexts, though some knowledge gaps remain, necessitating further improvements.Health-RAG -> https://huggingface.co/papers/2502.04666
Focuses on medical information retrieval by introducing a three-stage pipeline: retrieve, generate a reference summary (GenText), and re-rank based on factual alignment. Ensures accurate, evidence-backed health answers while mitigating misinformation risks.
