Kseniase (Ksenia Se)

replied to their post 5 days ago

Filesystem MCP Server -> https://github.com/modelcontextprotocol/servers/tree/HEAD/src/filesystem
Read, write, search files, plus create, delete, list and move directories specified via args.
Notion MCP Server -> https://github.com/makenotion/notion-mcp-server
Enable models to interact with your Notion workspace to automate tasks such as searching, reading, creating, and updating pages and databases
Markdownify MCP Server -> https://github.com/zcaceres/markdownify-mcp
Converts various file types (PDFs, images, audio) and web pages to Markdown format
Fetch MCP Server -> https://github.com/modelcontextprotocol/servers/tree/main/src/fetch
Allows LLMs to retrieve and process content from web pages, converting HTML to markdown
Mobile Next - MCP server for Mobile Development and Automation -> https://github.com/mobile-next/mobile-mcp
Enables Agents and LLMs to interact with iOS/Android apps using accessibility snapshots or taps from screenshots
MCP installer -> https://github.com/anaisbetts/mcp-installer
This one is quite hilarious - "MCP for MCP". It allows you to ask your model (Claude, for example) to install MCP servers hosted in npm or PyPi for you.

posted an update 5 days ago

Post

1765

13 Awesome MCP Servers

MCP changed how agents connect with tools.

After writing the most read explanation of MCP on Hugging Face (https://huggingface.co/blog/Kseniase/mcp), we chose this 13 awesome MCP servers that you can work with:

1. Agentset MCP -> https://github.com/agentset-ai/mcp-server
For efficient and quick building of intelligent, doc-based apps using open-source Agentset platform for RAG

2. GitHub MCP Server -> https://github.com/github/github-mcp-server
Integrates GitHub APIs into your workflow, allowing to build AI tools and apps that interact with GitHub's ecosystem

3. arXiv MCP -> https://github.com/andybrandt/mcp-simple-arxiv
Allows working with research papers on arXiv through effective search and access to their metadata, abstracts, and links

4. MCP Run Python -> https://github.com/pydantic/pydantic-ai/tree/main/mcp-run-python
Enables to run Python code in a sandbox via Pyodide in Deno, so it can be isolated from the rest of the operating system

5. Safe Local Python Executor -> https://github.com/maxim-saplin/mcp_safe_local_python_executor
A lightweight tool for running LLM-generated Python code locally, using Hugging Face’s LocalPythonExecutor (from smolagents framework) and exposing it via MCP for AI assistant integration

6. Cursor MCP Installer -> https://github.com/matthewdcage/cursor-mcp-installer
Allows to automatically add MCP servers to Cursor for development convenience

7. Basic Memory -> https://memory.basicmachines.co/docs/introduction
This knowledge management system connects to LLMs and lets you build a persistent semantic graph from AI conversations with AI agents

Read further in the comments 👇

If you like it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

·

replied to their post 12 days ago

T-JEPA -> https://huggingface.co/papers/2410.05016
This one is for tabular (structured) data. By masking one subset of a table’s features and predicting their latent representation from another subset, it learns rich, label-agnostic embeddings
ACT-JEPA -> https://huggingface.co/papers/2501.14622
Merges imitation and self-supervised learning to learn policy embeddings without heavy expert data. It predicts chunked actions and abstract observations in latent space, filtering noise, modeling dynamics, and cutting compounding errors
Brain-JEPA -> https://huggingface.co/papers/2409.19407
Applies JEPA in brain dynamics foundation model for demographic, disease, and trait prediction.
3D-JEPA -> https://huggingface.co/papers/2409.15803
JEPA for 3D representation learning. It samples one rich context block and several target blocks, then predicts each target’s embedding from the context
Point-JEPA -> https://huggingface.co/papers/2404.16432
Brings joint-embedding predictive learning to point clouds. A lightweight sequencer orders patch embeddings. It lets the model choose context and target patches and reuse distance calculations for speed

posted an update 12 days ago

Post

4228

12 Types of JEPA

JEPA, or Joint Embedding Predictive Architecture, is an approach to building AI models introduced by Yann LeCun. It differs from transformers by predicting the representation of a missing or future part of the input, rather than the next token or pixel. This encourages conceptual understanding, not just low-level pattern matching. So JEPA allows teaching AI to reason abstractly.

Here are 12 types of JEPA you should know about:

1. I-JEPA -> Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture (2301.08243)
A non-generative, self-supervised learning framework designed for processing images. It works by masking parts of the images and then trying to predict those masked parts

2. MC-JEPA -> MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features (2307.12698)
Simultaneously interprets video data - dynamic elements (motion) and static details (content) - using a shared encoder

3. V-JEPA -> Revisiting Feature Prediction for Learning Visual Representations from Video (2404.08471)
Presents vision models trained by predicting future video features, without pretrained image encoders, text, negative sampling, or reconstruction

4. UI-JEPA -> UI-JEPA: Towards Active Perception of User Intent through Onscreen User Activity (2409.04081)
Masks unlabeled UI sequences to learn abstract embeddings, then adds a fine-tuned LLM decoder for intent prediction.

5. Audio-based JEPA (A-JEPA) -> A-JEPA: Joint-Embedding Predictive Architecture Can Listen (2311.15830)
Masks spectrogram patches with a curriculum, encodes them, and predicts hidden representations.

6. S-JEPA -> S-JEPA: towards seamless cross-dataset transfer through dynamic spatial attention (2403.11772)
Signal-JEPA is used in EEG analysis. It adds a spatial block-masking scheme and three lightweight downstream classifiers

7. TI-JEPA -> TI-JEPA: An Innovative Energy-based Joint Embedding Strategy for Text-Image Multimodal Systems (2503.06380)
Text-Image JEPA uses self-supervised, energy-based pre-training to map text and images into a shared embedding space, improving cross-modal transfer to downstream tasks

Find more types below 👇

Also, explore the basics of JEPA in our article: https://www.turingpost.com/p/jepa

If you liked it, subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

·

replied to their post 19 days ago

Coursera courses:

AI Agents: From Prompts to Multi-Agent Systems -> https://www.coursera.org/learn/ai-agents-from-prompts-to-multi-agent-systems
Covers generative AI basics, building agentic workflows and the creation and orchestration of MAS. Includes 5 modules and is good for beginners.
Mastering Multi-Agent Development with AutoGen -> https://www.coursera.org/learn/packt-mastering-multi-agent-development-with-autogen-zyalb
Build MAS with AutoGen’s tools, optimizing their communications with special chat patterns. Intermediate level, includes 9 modules.
Practical Multi AI Agents and Advanced Use Cases with crewAI https://www.coursera.org/projects/practical-multi-ai-agents-and-advanced-use-cases-with-crewai

Explore collaborative agents for complex workflows, that integrate external tools and use different models. You will build MAS to automate planning, scoring, and large content creation tasks. It's good for beginner level

posted an update 19 days ago

Post

3135

7 Free resources to master Multi-Agent Systems (MAS)

Collective intelligence is the future of AI. Sometimes, a single agent isn't enough — a team of simpler, specialized agents working together to solve a task can be a much better option. Building Multi-Agent Systems (MAS) isn’t easy, that's why today we’re offering you a list of sources that may help you master MAS:

1. CrewAI tutorials -> https://docs.crewai.com/introduction#ready-to-start-building%3F
At the end of the page you'll find a guide on how to build a crew of agents that research and analyze a topic, and create a report. Also, there are useful guides on how to build a single CrewAI agent and a workflow

2. Building with CAMEL multi-agent framework -> https://github.com/camel-ai/camel
Offers guides, cookbooks and other useful information to build even million agent societies, explore and work with MAS

3. Lang Chain multi-agent tutorial -> https://langchain-ai.github.io/langgraph/agents/multi-agent/
Explains how to make agents communicate via handoffs pattern on the example of 2 multi-agent architectures - supervisor and swarm

4. "Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations" by Yoav Shoham and Kevin Leyton-Brown -> https://www.masfoundations.org/download.html
This book explains learning between agents, how multiple agents solve shared problems and communicate with focus on theory, practical examples and algorithms, diving into the game theory and logical approaches

Also, check out The Turing Post article about MAS -> https://www.turingpost.com/p/mas
Our article can be a good starting guide for you to explore what MAS is, its components, architectures, types, top recent developments and current trends

More resources in the comments 👇

If you liked it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

2 replies

·

replied to their post 26 days ago

Reinforcement Learning from Human Feedback (RLHF) -> https://huggingface.co/papers/2203.02155
The classic approach that combines supervised fine-tuning with RL on a reward model trained from human preference data.
Check out other RL+F approaches here: https://www.turingpost.com/p/rl-f
Monte Carlo Tree Search (MCTS) -> https://huggingface.co/papers/2305.10601
This planning algorithm builds a search tree by simulating many reasoning paths from the current state, balancing exploration and exploitation.
AMPO (Active Multi-Preference Optimization) -> https://huggingface.co/papers/2502.18293
Combines on-policy generation, contrastive learning, and smart selection of training examples. From many possible responses, it picks a small, diverse set with both high- and low-quality answers and unique styles
SPIN (Self-Play Fine-Tuning) -> https://huggingface.co/papers/2401.01335
Uses self-play, where the model learns by comparing its own generated responses to earlier outputs and human examples
SPPO (Self-Play Preference Optimization) -> https://huggingface.co/papers/2405.00675
Aligns LMs by framing training as a two-player game where the model learns to improve against itself through preference comparisons, aiming to reach a Nash equilibrium
RSPO (Regularized Self-Play Policy Optimization) -> https://huggingface.co/papers/2503.00030
Lets models learn through self-play, with an extra regularization term added to keep training stable. It achieves best results with a linear combination of forward and reverse KL divergence regularization

posted an update 26 days ago

Post

4978

11 Alignment and Optimization Algorithms for LLMs

When we need to align models' behavior with the desired objectives, we rely on specialized algorithms that support helpfulness, accuracy, reasoning, safety, and alignment with user preferences. Much of a model’s usefulness comes from post-training optimization methods.

Here are the main optimization algorithms (both classic and new) in one place:

1. PPO (Proximal Policy Optimization) -> Proximal Policy Optimization Algorithms (1707.06347)
Clips the probability ratio to prevent the new policy from diverging too far from the old one. It helps keep everything stable

2. DPO (Direct Preference Optimization) -> Direct Preference Optimization: Your Language Model is Secretly a Reward Model (2305.18290)
It's a non RL method, where an LM is an implicit reward model. It uses a simple loss to boost the preferred answer’s probability over the less preferred one

3. GRPO (Group Relative Policy Optimization) -> DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models (2402.03300)
An RL method that compares a group of model outputs for the same input and updates the policy based on relative rankings. It doesn't need a separate critic model
It's latest application is Flow-GRPO which adds online RL into flow matching models -> Flow-GRPO: Training Flow Matching Models via Online RL (2505.05470)

4. DAPO (Decoupled Clip and Dynamic sAmpling Policy Optimization) -> DAPO: An Open-Source LLM Reinforcement Learning System at Scale (2503.14476)
Decouples the clipping bounds for flexibility, introducing 4 key techniques: clip-higher (to maintain exploration), dynamic sampling (to ensure gradient updates), token-level loss (to balance learning across long outputs), and overlong reward shaping (to handle long, truncated answers)

5. Supervised Fine-Tuning (SFT) -> Training language models to follow instructions with human feedback (2203.02155)
Often the first post-pretraining step. A model is fine-tuned on a dataset of high-quality human-written input-output pairs to directly teach desired behaviors

More in the comments 👇

If you liked it, also subscribe to the Turing Post: https://www.turingpost.com/subscribe

1 reply

·

replied to their post about 1 month ago

CoT-RAG -> https://huggingface.co/papers/2504.13534
Adds 3 new designs to CoT approach: 1) Knowledge Graph-driven CoT Generation to guide reasoning chains, 2) Learnable Knowledge Case-aware RAG for combining RAG with knowledge graphs to provide relevant sub-cases, and 3) Logic-based pseudo-program prompting execution.
Unsupervised Visual CoT (UV-CoT) -> https://huggingface.co/papers/2504.18397
Performs preference comparisons between model-generated bounding boxes. It generates and ranks model responses to visual regions, using this feedback to guide training to improve image-level reasoning.
CoTAL -> https://huggingface.co/papers/2504.02323
Combines CoT with active learning, using curriculum-aligned assessments, human-in-the-loop prompt design, and teacher/student feedback to improve automated grading. It boosts GPT-4’s accuracy by up to 24.5%.
Deconstructing Long CoT (DLCoT) -> https://huggingface.co/papers/2503.16385
Enhances distillation data by segmenting data, simplifying solutions, and optimizing of intermediate error states, improving model performance and token efficiency.

posted an update about 1 month ago

Post

4196

10 new Chain-of-Thoughts (CoT) methods

CoT has long been one of the hottest techniques in AI thanks to its effectiveness and compelling core idea: encouraging models to solve complex problems through explicit intermediate reasoning steps. But usually researchers modify original CoT approach, finding tips that further improve LLMs' reasoning. That's what we're going to talk about today.

Here's a list of 10 latest enhanced CoT approaches:

1. Chain-of-Defensive-Thought -> Chain-of-Defensive-Thought: Structured Reasoning Elicits Robustness in Large Language Models against Reference Corruption (2504.20769)
Provides a few structured, defensive reasoning exemplars to improve the robustness of LLMs

2. Hybrid-CoT -> AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization (2504.21659)
Proposes using Adaptive Hybrid Reasoning Model (AdaR1) that combines Long- and Short-CoT, and applying bi-level preference training to select effective reasoning styles

3. Semantic-level and token-level CoT -> T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT (2505.00703)
Introduces T2I-R1 text-to-image gen model, that uses semantic-level CoT for prompt planning and token-level CoT for pixel-level generation, while BiCoT-GRPO coordinates them both

4. Speculative CoT (SCoT) -> Efficient Reasoning for LLMs through Speculative Chain-of-Thought (2504.19095)
SCoT drafts multiple reasoning paths with a lightweight draft, selects the best, and uses the target model for correction - all this to reduce latency by 48–66%

5. Collaborative CoT (Co-CoT) -> Co-CoT: A Prompt-Based Framework for Collaborative Chain-of-Thought Reasoning (2504.17091)
Breaks reasoning into blocks that users can inspect, modify and re-run, promoting active engagement. An adaptation mechanism aligns outputs with diverse cognitive styles and user goals

6. XS-CoT -> Enhancing Non-Core Language Instruction-Following in Speech LLMs via Semi-Implicit Cross-Lingual CoT Reasoning (2504.20835)
It's a cross-lingual framework that integrates speech-to-text translation into reasoning, using a semi-implicit CoT approach to compress intermediate tokens. This improves non-core language responses by up to 45%

Read further in the comments 👇

If you liked this, also subscribe to the Turing Post -> https://www.turingpost.com/subscribe

1 reply

·

posted an update about 1 month ago

Post

6504

6 Free resources on Reinforcement Learning (RL)

RL now is where the real action is, it's the engine behind autonomous tech, robots, and the next wave of AI that thinks, moves and solves problems on its own. To stay up to date with what’s happening in RL, we offer some fresh materials on it:

1. "Reinforcement Learning from Human Feedback" by Nathan Lambert -> https://rlhfbook.com/
It's a short introduction to RLHF, explaining instruction tuning, reward modeling, alignment methods, synthetic data, evaluation, and more

2. "A Course in Reinforcement Learning (2nd Edition)" by Dimitri P. Bertsekas -> https://www.mit.edu/~dimitrib/RLbook.html
Explains dynamic programming (DP) and RL, diving into rollout algorithms, neural networks, policy learning, etc. It’s packed with solved exercises and real-world examples

3. "Mathematical Foundations of Reinforcement Learning" video course by Shiyu Zhao -> https://www.youtube.com/playlist?list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8
Offers a mathematical yet friendly introduction to RL, covering Bellman Equation, value iteration, Monte Carlo learning, approximation, policy gradient, actor-critic methods, etc.
+ Check out the repo for more: https://github.com/MathFoundationRL/Book-Mathematical-Foundation-of-Reinforcement-Learning

4. "Multi-Agent Reinforcement Learning" by Stefano V. Albrecht, Filippos Christianos, and Lukas Schäfer -> https://www.marl-book.com/
Covers models, core ideas of multi-agent RL (MARL) and modern approaches to combining it with deep learning

5. "Reinforcement Learning: A Comprehensive Overview" by Kevin P. Murphy -> https://arxiv.org/pdf/2412.05265
Explains RL and sequential decision making, covering value-based, policy-gradient, model-based, multi-agent RL methods, RL+LLMs, and RL+inference and other topics

6. Our collection of free courses and books on RL -> https://huggingface.co/posts/Kseniase/884818121094439

If you liked this, also subscribe to The Turing Post: https://www.turingpost.com/subscribe

replied to their post about 2 months ago

These are graph-centric types of RAG:

NodeRAG -> https://huggingface.co/papers/2504.11544
Uses well-designed heterogeneous graph structures and focuses on graph design to ensure smooth integration of graph algorithms. It outperforms GraphRAG and LightRAG on multi-hop and open-ended QA benchmarks
HeteRAG -> https://huggingface.co/papers/2504.10529
This heterogeneous RAG framework decouples knowledge chunk representations. It uses multi-granular views for retrieval and concise chunks for generation, along with adaptive prompt tuning
Hyper-RAG -> https://huggingface.co/papers/2504.08758
A hypergraph-based RAG method. By capturing both pairwise and complex relationships in domain-specific knowledge, it improves factual accuracy and reduces hallucinations, especially in high-stakes fields like medicine, surpassing Graph RAG and Light RAG. Its lightweight version also doubles retrieval speed

posted an update about 2 months ago

Post

7236

11 new types of RAG

RAG is evolving fast, keeping pace with cutting-edge AI trends. Today it becomes more agentic and smarter at navigating complex structures like hypergraphs.

Here are 11 latest RAG types:

1. InstructRAG -> InstructRAG: Leveraging Retrieval-Augmented Generation on Instruction Graphs for LLM-Based Task Planning (2504.13032)
Combines RAG with a multi-agent framework, using a graph-based structure, an RL agent to expand task coverage, and a meta-learning agent for better generalization

2. CoRAG (Collaborative RAG) -> CoRAG: Collaborative Retrieval-Augmented Generation (2504.01883)
A collaborative framework that extends RAG to settings where clients train a shared model using a joint passage store

3. ReaRAG -> ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation (2503.21729)
It uses a Thought-Action-Observation loop to decide at each step whether to retrieve information or finalize an answer, reducing unnecessary reasoning and errors

4. MCTS-RAG -> MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search (2503.20757)
Combines RAG with Monte Carlo Tree Search (MCTS) to help small LMs handle complex, knowledge-heavy tasks

5. Typed-RAG - > Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering (2503.15879)
Improves answers on open-ended questions by identifying question types (a debate, personal experience, or comparison) and breaking it down into simpler parts

6. MADAM-RAG -> Retrieval-Augmented Generation with Conflicting Evidence (2504.13079)
A multi-agent system where models debate answers over multiple rounds and an aggregator filters noise and misinformation

7. HM-RAG -> HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation (2504.12330)
A hierarchical multi-agent RAG framework that uses 3 agents: one to split queries, one to retrieve across multiple data types (text, graphs and web), and one to merge and refine answers

8. CDF-RAG -> CDF-RAG: Causal Dynamic Feedback for Adaptive Retrieval-Augmented Generation (2504.12560)
Works with causal graphs and enables multi-hop causal reasoning, refining queries. It validates responses against causal pathways

To explore what is Causal AI, read our article: https://www.turingpost.com/p/causalai

Subscribe to the Turing Post: https://www.turingpost.com/subscribe

Read further 👇

1 reply

·

reacted to fdaudens's post with 🔥 about 2 months ago

Post

1606

Just tested something this morning that feels kind of game-changing for how we publish, discover, and consume news with AI: connecting Claude directly to the New York Times through MCP.

Picture this: You ask Claude about a topic, and it instantly pulls verified and trusted NYT content — no more guessing if the info is accurate.

The cool part? Publishers stay in control of what they share via API, and users get fast, reliable access through the AI tools they already use. Instead of scraping random stuff off the web, we get a future where publishers actively shape how their journalism shows up in AI.

It’s still a bit technical to set up right now, but this could get super simple soon — like installing apps on your phone, but for your chatbot. And you keep the brand connection, too.

Not saying it solves everything, but it’s definitely a new way to distribute content — and maybe even find some fresh value in the middle of this whole news + AI shakeup. Early movers will have a head start.

Curious what folks think — could MCPs be a real opportunity for journalism?

1 reply

·

reacted to their post with 👍 about 2 months ago

Post

5574

16 new research on inference-time scaling:

For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models.

So here are 13 new methods + 3 comprehensive studies on test-time scaling:

1. Inference-Time Scaling for Generalist Reward Modeling (2504.02495)
Probably, the most popular study. It proposes to boost inference-time scalability by improving reward modeling. To enhance performance, DeepSeek-GRM uses adaptive critiques, parallel sampling, pointwise generative RM, and Self-Principled Critique Tuning (SPCT)

2. T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models (2504.04718)
Allows small models to use external tools, like code interpreters and calculator, to enhance self-verification

3. Z1: Efficient Test-time Scaling with Code (2504.00810)
Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window

4. GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning (2504.00891)
Introduces GenPRM, a generative PRM, that uses CoT reasoning and code verification for step-by-step judgment. With only 23K training examples, GenPRM outperforms prior PRMs and larger models

5. Can Test-Time Scaling Improve World Foundation Model? (2503.24320)
SWIFT test-time scaling framework improves World Models' performance without retraining, using strategies like fast tokenization, Top-K pruning, and efficient beam search

6. Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking (2504.07104)
Proposes REBEL for RAG systems scaling, which uses multi-criteria optimization with CoT prompting for better performance-speed tradeoffs as inference compute increases

7. $φ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation (2503.13288)
Proposes a φ-Decoding strategy that uses foresight sampling, clustering and adaptive pruning to estimate and select optimal reasoning steps

Read further below 👇

Also, subscribe to the Turing Post https://www.turingpost.com/subscribe

2 replies

·

replied to their post about 2 months ago

Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing -> https://huggingface.co/papers/2503.19385
An effective test-time scaling method for flow models with SDE-based generation for particle sampling, interpolant conversion to enhance diversity, and Rollover Budget Forcing (RBF) for adaptive compute allocation
Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks -> https://huggingface.co/papers/2503.04378
Introduces a Feedback-Edit model setup that improves inference-time scaling, particularly for open-ended tasks, by using 3 different model for drafting, feedback and editing
m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models -> https://huggingface.co/papers/2504.00869
A simple m1 method improves medical performance at inference, with models under 10B outperforming previous benchmarks and a 32B model matching 70B models
ToolACE-R: Tool Learning with Adaptive Self-Refinement -> https://huggingface.co/papers/2504.01400
ToolACE-R enables adaptive self-refinement of tool use through model-aware iterative training. It refines tool calls without external feedback and scales inference compute efficiently
Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding -> https://huggingface.co/papers/2504.01281
Introduces a lightweight RAG framework that uses PORAG for better content use, ATLAS for adaptive retrieval timing, and CRITIC for efficient memory use. Together with optimized decoding strategies and adaptive reasoning depth, it allows the model to scale its inference steps effectively.
Do We Truly Need So Many Samples? Multi-LLM Repeated Sampling Efficiently Scales Test-Time Compute -> https://huggingface.co/papers/2504.00762
ModelSwitch is a sampling-then-voting strategy that uses multiple models (including weaker ones) to leverage diverse strengths, where a consistency signal guides dynamic model switching. It highlights the potential of multi-model generation-verification.

3 comprehensive surveys on inference time-scaling:

Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead -> https://huggingface.co/papers/2504.00294
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models -> https://huggingface.co/papers/2503.24235
Efficient Inference for Large Reasoning Models: A Survey -> https://huggingface.co/papers/2503.23077

posted an update about 2 months ago

Post

5574

16 new research on inference-time scaling:

For the last couple of weeks a large amount of studies on inference-time scaling has emerged. And it's so cool, because each new paper adds a trick to the toolbox, making LLMs more capable without needing to scale parameter count of the models.

So here are 13 new methods + 3 comprehensive studies on test-time scaling:

1. Inference-Time Scaling for Generalist Reward Modeling (2504.02495)
Probably, the most popular study. It proposes to boost inference-time scalability by improving reward modeling. To enhance performance, DeepSeek-GRM uses adaptive critiques, parallel sampling, pointwise generative RM, and Self-Principled Critique Tuning (SPCT)

2. T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models (2504.04718)
Allows small models to use external tools, like code interpreters and calculator, to enhance self-verification

3. Z1: Efficient Test-time Scaling with Code (2504.00810)
Proposes to train LLMs on code-based reasoning paths to make test-time scaling more efficient, limiting unnecessary tokens with a special dataset and a Shifted Thinking Window

4. GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning (2504.00891)
Introduces GenPRM, a generative PRM, that uses CoT reasoning and code verification for step-by-step judgment. With only 23K training examples, GenPRM outperforms prior PRMs and larger models

5. Can Test-Time Scaling Improve World Foundation Model? (2503.24320)
SWIFT test-time scaling framework improves World Models' performance without retraining, using strategies like fast tokenization, Top-K pruning, and efficient beam search

6. Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking (2504.07104)
Proposes REBEL for RAG systems scaling, which uses multi-criteria optimization with CoT prompting for better performance-speed tradeoffs as inference compute increases

7. $φ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation (2503.13288)
Proposes a φ-Decoding strategy that uses foresight sampling, clustering and adaptive pruning to estimate and select optimal reasoning steps

Read further below 👇

Also, subscribe to the Turing Post https://www.turingpost.com/subscribe

2 replies

·

replied to their post 2 months ago

Edge inference -> https://arxiv.org/pdf/2112.00616
Refers to running AI models locally on edge devices (mobile phones, IoT devices, embedded hardware) or on servers at the network edge.
Cloud inference -> https://huggingface.co/papers/2210.05889
Input data is sent from users/devices to the cloud, where large-scale compute (CPUs, GPUs, TPUs) runs the AI model and returns the results.

Explore the other important aspects about AI inference, including how it works and what are the current trends, in our article: https://www.turingpost.com/p/inference-805f

If you like it, also subscribe to the Turing Post -> https://www.turingpost.com/subscribe

posted an update 2 months ago

Post

2715

9 Types of AI inference

AI inference refers to the process when AI models generate predictions, classifications, or decisions based on input data and pre-trained models. It encompasses a wide range of approaches with different computational methods and deployment.

Firstly, here are 5 inference types, based on how the model reasons:

1. Probabilistic inference -> https://arxiv.org/pdf/2502.05244
Uses probability theory to reason under uncertainty. The system maintains degrees of belief over hypotheses and updates them as evidence comes in.

2. Rule-based inference -> Logicbreaks: A Framework for Understanding Subversion of Rule-based Inference (2407.00075)
Draws conclusions by applying explicit if-then rules encoded in a knowledge base. Mostly used in neurosymbolic AI.

3. Logical inference -> https://arxiv.org/abs/2009.03393
Uses formal logic to draw conclusions that are guaranteed true if the premises are. It supports theorem proving, logic programming, and tasks needing correctness, like software verification.

4. Abductive inference -> Can ChatGPT Make Explanatory Inferences? Benchmarks for Abductive Reasoning (2404.18982)
Involves forming hypotheses that would best explain a given set of observations - among multiple possible explanations, the goal is to choose the most plausible. Abduction is inherently creative and uncertain.

5. Fuzzy inference -> DCNFIS: Deep Convolutional Neuro-Fuzzy Inference System (2308.06378)
Applies fuzzy logic – reasoning with degrees of truth rather than binary true/false. Inputs are mapped to fuzzy sets with membership grades between 0 and 1.

Secondly, here are 4 inference types based on its execution contexts:

1. Batch inference -> BatchLLM: Optimizing Large Batched LLM Inference with Global Prefix Sharing and Throughput-oriented Token Batching (2412.03594)
Involves generating model predictions on large sets of data in bulk, often on a scheduled basis or as needed for analysis rather than immediate use.

2. Real-time inference -> Real-time Inference and Extrapolation via a Diffusion-inspired Temporal Transformer Operator (DiTTO) (2307.09072)
Produces outputs on-demand with minimal latency, so results are available immediately when needed.

Read further in the comments 👇

2 replies

·

replied to their post 2 months ago

Hypergraph-of-Thought (HoT) -> https://huggingface.co/papers/2308.06207
Uses textual and visual hypergraphs with cross-modal co-attention to model high-order multi-hop reasoning.

Also, we recommend you to read this recent comprehensive survey on Multimodal CoT: https://huggingface.co/papers/2503.12605

Ksenia Se

AI & ML interests

Recent Activity

Organizations

Kseniase's activity