Post
687
If you didn't yet, you should read the technical report for SmolVLA, published yesterday by the Hugging Face robotics team!
โก๏ธ Amongst other ideas, it introduces "Async inference" to boost their robot actions.
Robots have a problem: performing the actions takes time (Unlike agents where action executions are near-instant!)
Most often, robots wait until they've finished performing actions to start thinking about hte next steps. This is a huge latency cost!
So the team decided to have the PolicyServer (aka the"thinking" part) restart early : instead of waiting for the n observations they just sent to be completed, they gather the observations after k < n steps, and start preparing the next actions based on that while the steps are running until n, to directly send their next steps.
โก๏ธ This boosted robot throughput by ~30%! (nearly 2ร tasks per time window).
gg @cadene and team! ๐
Report here: SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics (2506.01844)
โก๏ธ Amongst other ideas, it introduces "Async inference" to boost their robot actions.
Robots have a problem: performing the actions takes time (Unlike agents where action executions are near-instant!)
Most often, robots wait until they've finished performing actions to start thinking about hte next steps. This is a huge latency cost!
So the team decided to have the PolicyServer (aka the"thinking" part) restart early : instead of waiting for the n observations they just sent to be completed, they gather the observations after k < n steps, and start preparing the next actions based on that while the steps are running until n, to directly send their next steps.
โก๏ธ This boosted robot throughput by ~30%! (nearly 2ร tasks per time window).
gg @cadene and team! ๐
Report here: SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics (2506.01844)