BERT-based-uncased models fine-tuned on SST-2
			
	
	LEAF
community
						
						
						
						AI & ML interests
None defined yet.
Recent Activity
	View all activity
	
			Organization Card
		
		Robustness in Both Domains: CLIP Needs a Robust Text Encoder
Elias Abad Rocamora, Christian Schlarmann, Naman Deep Singh, Yongtao Wu, Matthias Hein and Volkan Cevher
LIONS @ EPFL and Tübingen AI Center
In this repo, you will find all the models trained for our NeurIPS 2025 paper.
Loading CLIPModels
You can load our models as any other CLIP model, for example, loading LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2 can be done by following the "openai/clip-vit-large-patch14" example snippet:
from PIL import Image
import requests
from transformers import CLIPProcessor, CLIPModel
model_name = "LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2"
processor_name = "openai/clip-vit-large-patch14"
model = CLIPModel.from_pretrained(model_name)
processor = CLIPProcessor.from_pretrained(processor_name)
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="pt", padding=True)
outputs = model(**inputs)
logits_per_image = outputs.logits_per_image # this is the image-text similarity score
probs = logits_per_image.softmax(dim=1) # we can take the softmax to get the label probabilities
When loading other model sizes, the processor_name needs to be changed accordingly as:
| Model Size | Processor Name | 
|---|---|
| ViT-L-14 | "openai/clip-vit-large-patch14" | 
| ViT-H-14 | "laion/CLIP-ViT-H-14-laion2B-s32B-b79K" | 
| ViT-g-14 | "laion/CLIP-ViT-g-14-laion2B-s12B-b42K" | 
| ViT-bigG-14 | "laion/CLIP-ViT-bigG-14-laion2B-39B-b160k" | 
Loading CLIPTextModels
If just need the text encoder, you can load it with the following snippet:
from transformers import CLIPTokenizer, CLIPTextModel
model_name = "LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2"
processor_name = "openai/clip-vit-large-patch14"
model = CLIPTextModel.from_pretrained(model_name)
tokenizer = CLIPTokenizer.from_pretrained(processor_name)
inputs = tokenizer(["a photo of a cat", "a photo of a dog"],  padding=True, return_tensors="pt")
outputs = model(**inputs)
last_hidden_state = outputs.last_hidden_state
pooled_output = outputs.pooled_output # pooled (EOS token) states
Acknowledgements
Our codebase is based in the OpenCLIP codebase, we appreciate the effort of the OpenCLIP team and the release of their code and model weights.
- 
	
	
	  LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2Feature Extraction • 0.4B • Updated • 1
- 
	
	
	  LEAF-CLIP/OpenCLIP-ViT-H-rho50-k1-constrained-FARE2Feature Extraction • 1.0B • Updated • 1
- 
	
	
	  LEAF-CLIP/OpenCLIP-ViT-bigG-rho50-k1-constrainedFeature Extraction • 3B • Updated
- 
	
	
	  LEAF-CLIP/OpenCLIP-ViT-g-rho50-k1-constrained-FARE2Feature Extraction • 1B • Updated • 1
BERT-based-uncased models fine-tuned on SST-2
			
	
	- 
	
	
	  LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2Feature Extraction • 0.4B • Updated • 1
- 
	
	
	  LEAF-CLIP/OpenCLIP-ViT-H-rho50-k1-constrained-FARE2Feature Extraction • 1.0B • Updated • 1
- 
	
	
	  LEAF-CLIP/OpenCLIP-ViT-bigG-rho50-k1-constrainedFeature Extraction • 3B • Updated
- 
	
	
	  LEAF-CLIP/OpenCLIP-ViT-g-rho50-k1-constrained-FARE2Feature Extraction • 1B • Updated • 1
			models
			42
		
			
	
	
	
	
	 
				LEAF-CLIP/LEAF-BERT-base-uncased-SST-2-rho-50-k1-constrained
			Text Classification
			• 
		
				0.1B
			• 
	
				Updated
					
				
				• 
					
					3
				
	
				
				
 
				LEAF-CLIP/LEAF-BERT-base-uncased-SST-2-rho-50-k1
		
				0.1B
			• 
	
				Updated
					
				
				• 
					
					3
				
	
				
				
 
				LEAF-CLIP/OpenCLIP-ViT-bigG-rho50-k1-constrained
			Feature Extraction
			• 
		
				3B
			• 
	
				Updated
					
				
				
				
	
				
				
 
				LEAF-CLIP/OpenCLIP-ViT-H-rho50-k1-constrained-FARE2
			Feature Extraction
			• 
		
				1.0B
			• 
	
				Updated
					
				
				• 
					
					1
				
	
				
				
 
				LEAF-CLIP/OpenCLIP-ViT-g-rho50-k1-constrained-FARE2
			Feature Extraction
			• 
		
				1B
			• 
	
				Updated
					
				
				• 
					
					1
				
	
				
				
 
				LEAF-CLIP/CLIP-ViT-L-rho50-k1-constrained-FARE2
			Feature Extraction
			• 
		
				0.4B
			• 
	
				Updated
					
				
				• 
					
					1
				
	
				
				
 
				LEAF-CLIP/OpenCLIP-ViT-g-rho50-k1-constrained
		
				1B
			• 
	
				Updated
					
				
				
				
	
				
				
 
				LEAF-CLIP/OpenCLIP-ViT-g-FARE2
		
				1B
			• 
	
				Updated
					
				
				
				
	
				
				
 
				LEAF-CLIP/OpenCLIP-ViT-g-rho50-k1-FARE2
		
				1B
			• 
	
				Updated
					
				
				
				
	
				
				
 
				LEAF-CLIP/OpenCLIP-ViT-g-rho50-k1
		
				1B
			• 
	
				Updated
					
				
				
				
	
				
				
			datasets
			0
		
			
	None public yet