--- tags: - patent-retrieval - image-search - hierarchical-learning - contrastive-learning license: mit base_model: - openai/clip-vit-base-patch16 pipeline_tag: image-to-image --- # PHOENIX: Hierarchical Contrastive Learning for Patent Image Retrieval **PHOENIX** is a domain-adapted CLIP/ViT-based model designed to improve **patent image retrieval**. It addresses the unique challenges of retrieving relevant technical drawings in patent documents, especially when searching for **semantically or hierarchically related images**, not just exact matches. This model is based on `openai/clip-vit-base-patch16` and fine-tuned using a **hierarchical multi-positive contrastive loss** that leverages **Locarno classification** β€” an international system used to categorize industrial designs. --- ## 🧠 Motivation Patent images are often **complex technical illustrations** that encode detailed structural or functional aspects of an invention. Current systems typically retrieve images from the same patent but fail when asked to retrieve **semantically similar inventions** across different patents or subclasses. For instance, a retrieval system should understand that a *"foldable camping chair"* and a *"stackable office chair"* both fall under the broader "seating" category β€” even if their visual structure differs. --- ## πŸ” What This Model Does - Leverages **CLIP ViT** for visual understanding of technical drawings - Trains using **hierarchical multi-positive contrastive learning** to encode Locarno structure: ``` Furniture β†’ Seating β†’ Chairs β†’ Specific Patent ``` - Encodes images such that **semantically similar inventions are close** in the embedding space β€” even if from different patents or subclasses --- ## πŸ“¦ How to Use ### Load Model ```python from transformers import CLIPModel, CLIPProcessor from PIL import Image import torch # Load fine-tuned model and processor model = CLIPModel.from_pretrained("kshitij3188/PHOENIX-patent-retrieval") processor = CLIPProcessor.from_pretrained("kshitij3188/PHOENIX-patent-retrieval") model.eval() ``` ### Extract Embeddings ```python from torchvision import transforms def extract_image_embedding(image_path): image = Image.open(image_path).convert("RGB") inputs = processor(images=image, return_tensors="pt") with torch.no_grad(): embedding = model.get_image_features(**inputs).squeeze() return embedding # Example embedding = extract_image_embedding("some_patent_image.png") print("πŸ” Image embedding shape:", embedding.shape) ``` You can now compare cosine similarity between embeddings to retrieve similar patent drawings. --- ## πŸ† Results Evaluated on the **DeepPatent2** dataset, PHOENIX shows significant gains in: - **Intra-category retrieval** (same subclass) - **Cross-category generalization** (related but distinct inventions) - **Low-parameter robustness**, making it suitable for real-time deployment --- ## πŸ’‘ Use Cases - πŸ” **Prior Art Search** – Find related inventions even if visually different - 🧠 **Design Inspiration** – Explore similar patent structures from other domains - πŸ“‘ **Semantic Tagging** – Automatically cluster patents into meaningful groups - πŸ›‘οΈ **IP Protection** – Detect potential overlaps or infringements more robustly --- ## πŸ› οΈ Model Architecture This model wraps `ViTModel` in a custom class `PatentEmbeddingModel`, which: - Accepts a checkpoint fine-tuned on hierarchical labels - Uses the CLS token embedding for image representation - Integrates seamlessly with transformers’ ViT feature extractors --- ## πŸ“œ License This model is released under the MIT License. --- ## ✨ Credits Developed as part of a Master's thesis on improving patent retrieval through hierarchical representation learning.