CLIP Vision Encoder - Overfit Training
This model is a fine-tuned ConvNext-Large vision encoder trained with contrastive learning on image-image similarity.
Training Details
- Epoch: 4
- Loss: 1.4814
- Accuracy: 0.9707
- Base Model: convnext_large.fb_in22k_ft_in1k_384
- Training Images: 8577
Usage
import timm
import torch
model = timm.create_model('convnext_large.fb_in22k_ft_in1k_384', pretrained=False, num_classes=0)
checkpoint = torch.load('checkpoints/epoch_004_loss_1.4814.pt')
model.load_state_dict(checkpoint['model_state'])
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support