Video-Text-to-Text
Transformers
Safetensors
MLX
English
smolvlm
image-text-to-text

HuggingFaceTB/SmolVLM2-500M-Video-Instruct-mlx

This model was converted to MLX format from HuggingFaceTB/SmolVLM2-500M-Video-Instruct using mlx-vlm version 0.1.13. Refer to the original model card for more details on the model.

Use with mlx

pip install -U mlx-vlm
python -m mlx_vlm.generate --model mlx-community/SmolVLM2-500M-Video-Instruct-mlx --image https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg --prompt "Can you describe this image?"
Downloads last month
4,409
Safetensors
Model size
507M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/SmolVLM2-500M-Video-Instruct-mlx

Finetuned
(48)
this model

Datasets used to train mlx-community/SmolVLM2-500M-Video-Instruct-mlx