PHOENIX: Hierarchical Contrastive Learning for Patent Image Retrieval
PHOENIX is a domain-adapted CLIP/ViT-based model designed to improve patent image retrieval. It addresses the unique challenges of retrieving relevant technical drawings in patent documents, especially when searching for semantically or hierarchically related images, not just exact matches.
This model is based on openai/clip-vit-base-patch16
and fine-tuned using a hierarchical multi-positive contrastive loss that leverages Locarno classification β an international system used to categorize industrial designs.
π§ Motivation
Patent images are often complex technical illustrations that encode detailed structural or functional aspects of an invention. Current systems typically retrieve images from the same patent but fail when asked to retrieve semantically similar inventions across different patents or subclasses.
For instance, a retrieval system should understand that a "foldable camping chair" and a "stackable office chair" both fall under the broader "seating" category β even if their visual structure differs.
π What This Model Does
- Leverages CLIP ViT for visual understanding of technical drawings
- Trains using hierarchical multi-positive contrastive learning to encode Locarno structure:
Furniture β Seating β Chairs β Specific Patent
- Encodes images such that semantically similar inventions are close in the embedding space β even if from different patents or subclasses
π¦ How to Use
Load Model
from transformers import CLIPModel, CLIPProcessor
from PIL import Image
import torch
# Load fine-tuned model and processor
model = CLIPModel.from_pretrained("kshitij3188/PHOENIX-patent-retrieval")
processor = CLIPProcessor.from_pretrained("kshitij3188/PHOENIX-patent-retrieval")
model.eval()
Extract Embeddings
from torchvision import transforms
def extract_image_embedding(image_path):
image = Image.open(image_path).convert("RGB")
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
embedding = model.get_image_features(**inputs).squeeze()
return embedding
# Example
embedding = extract_image_embedding("some_patent_image.png")
print("π Image embedding shape:", embedding.shape)
You can now compare cosine similarity between embeddings to retrieve similar patent drawings.
π Results
Evaluated on the DeepPatent2 dataset, PHOENIX shows significant gains in:
- Intra-category retrieval (same subclass)
- Cross-category generalization (related but distinct inventions)
- Low-parameter robustness, making it suitable for real-time deployment
π‘ Use Cases
- π Prior Art Search β Find related inventions even if visually different
- π§ Design Inspiration β Explore similar patent structures from other domains
- π Semantic Tagging β Automatically cluster patents into meaningful groups
- π‘οΈ IP Protection β Detect potential overlaps or infringements more robustly
π οΈ Model Architecture
This model wraps ViTModel
in a custom class PatentEmbeddingModel
, which:
- Accepts a checkpoint fine-tuned on hierarchical labels
- Uses the CLS token embedding for image representation
- Integrates seamlessly with transformersβ ViT feature extractors
π License
This model is released under the MIT License.
β¨ Credits
Developed as part of a Master's thesis on improving patent retrieval through hierarchical representation learning.
- Downloads last month
- 22
Model tree for kshitij3188/PHOENIX-patent-retrieval
Base model
openai/clip-vit-base-patch16