This repository contains the HandsOnVLM model presented in the paper HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction.

Project page: https://www.chenbao.tech/handsonvlm/ Code: https://github.com/Kami-code/HandsOnVLM-release

Downloads last month: 15

Inference Providers NEW

Image-Text-to-Text

This model is not currently available via any of the supported third-party Inference Providers, and the HF Inference API does not support transformers models with pipeline type image-text-to-text