Post
664
If you didn't yet, you should read the technical report for SmolVLA, published yesterday by the Hugging Face robotics team!
➡️ Amongst other ideas, it introduces "Async inference" to boost their robot actions.
Robots have a problem: performing the actions takes time (Unlike agents where action executions are near-instant!)
Most often, robots wait until they've finished performing actions to start thinking about hte next steps. This is a huge latency cost!
So the team decided to have the PolicyServer (aka the"thinking" part) restart early : instead of waiting for the n observations they just sent to be completed, they gather the observations after k < n steps, and start preparing the next actions based on that while the steps are running until n, to directly send their next steps.
➡️ This boosted robot throughput by ~30%! (nearly 2× tasks per time window).
gg @cadene and team! 👏
Report here: SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics (2506.01844)
➡️ Amongst other ideas, it introduces "Async inference" to boost their robot actions.
Robots have a problem: performing the actions takes time (Unlike agents where action executions are near-instant!)
Most often, robots wait until they've finished performing actions to start thinking about hte next steps. This is a huge latency cost!
So the team decided to have the PolicyServer (aka the"thinking" part) restart early : instead of waiting for the n observations they just sent to be completed, they gather the observations after k < n steps, and start preparing the next actions based on that while the steps are running until n, to directly send their next steps.
➡️ This boosted robot throughput by ~30%! (nearly 2× tasks per time window).
gg @cadene and team! 👏
Report here: SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics (2506.01844)