SmolVLM-500M-Instruct-fer0

Fine-tuned version of SmolVLM-500M-Instruct on a subset of AffectNet (emotion recognition), with text labels transcribed via GPT-4o-mini.

This is just priliminary, we'll update soon with proper evalutation and info.

Example

Image input
image

Predictions:

  • Base model: A woman with blonde hair is looking to the side with a hand on her chin.
  • This model: The expression conveys a sense of contemplation or concern. The furrowed brow and slightly parted lips suggest a deep thought or worry. The hand on the chin indicates a hint of introspection, hinting at a possible emotional state of unease or contemplation.

Training Summary

  • Loss values:
Step Training Loss
25 2.80
50 0.82
75 0.48
100 0.43
  • Hyperparameters:
    • Learning rate: 1e-4
    • Batch size: 4 (grad. accum. ร—4)
    • Epochs: 1
    • Optimizer: 8-bit AdamW
    • Scheduler: linear (warmup 50 steps)
    • Seed: 42

Frameworks

  • Transformers 4.50.0
  • PyTorch 2.3.1+cu121
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
5
Safetensors
Model size
507M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for JoseferEins/SmolVLM-500M-Instruct-fer0