Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

nkkbr
/
ViCA2-init

Video-Text-to-Text
Transformers
Safetensors
sam2
English
vica_qwen
text-generation
multimodal
vision-language
video understanding
visuospatial cognition
spatial reasoning
vlm
llava
qwen
siglip
hiera
dual-encoder
Model card Files Files and versions
xet
Community
1
New discussion
Resources
  • PR & discussions documentation
  • Code of Conduct
  • Hub documentation

the released checkpoint seems not fitting the transformers

7
#1 opened about 1 month ago by
catherinexyz
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs