CLIP Vision Encoder - Overfit Training

This model is a fine-tuned ConvNext-Large vision encoder trained with contrastive learning on image-image similarity.

Training Details

  • Epoch: 4
  • Loss: 1.4814
  • Accuracy: 0.9707
  • Base Model: convnext_large.fb_in22k_ft_in1k_384
  • Training Images: 8577

Usage

import timm
import torch

model = timm.create_model('convnext_large.fb_in22k_ft_in1k_384', pretrained=False, num_classes=0)
checkpoint = torch.load('checkpoints/epoch_004_loss_1.4814.pt')
model.load_state_dict(checkpoint['model_state'])
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support