SmolVLM-256M-Detection
experimental and for learning purposes; wouldn't recommend using unless
check out github/shreydan/VLM-OD for results and details.
Usage
- load the model same as
HuggingFaceTB/SmolVLM-256M-Instruct - inputs:
detect car/detect person;caretc. Apply chat template withadd_generation_prompt=True - parse the output tokens
<loc000>to<loc255>(code ineval.ipynbof my github repo) - to reiterate I have not added any
<locXXX>special tokens (that needs wayyy more training than this method), the model itself is generating them.
- Downloads last month
- 26
Model tree for shreydan/SmolVLM-256M-Detection
Base model
HuggingFaceTB/SmolLM2-135M
Quantized
HuggingFaceTB/SmolLM2-135M-Instruct
Quantized
HuggingFaceTB/SmolVLM-256M-Instruct
