sportsvision
/

omniemb-v1

unified_embedder

Model card Files Files and versions

VarunKodathala commited on Jan 14

Commit

e563798

·

verified ·

1 Parent(s): 2d734d2

Update README.md

Files changed (1) hide show

README.md +109 -3

README.md CHANGED Viewed

@@ -1,3 +1,109 @@
----
-license: openrail
----

+---
+license: openrail
+---
+# OmniEmb-v1: Multi-Modal Embeddings for Unified Retrieval
+A compact multi-modal embedding model that creates unified embeddings for text and images, enabling efficient retrieval across modalities without intermediate VLM transformations.
+## Features
+* 1536d unified embedding space
+* Text2Text, Text2Image, and Image2Image retrieval support
+* Direct embedding without VLM conversion steps
+* Layout preservation for image data
+## Performance
+### Cross-Modal Retrieval (vs CLIP-ViT-B/32)
+* Hits@1: 0.428 (+60.8%)
+* Hits@5: 0.651 (+38.9%)
+### Correlation Metrics (vs LaBSE)
+* STS-B Pearson: 0.800 (+9.7%)
+* STS-B Spearman: 0.795 (+7.3%)
+* SICK Pearson: 0.782 (+6.3%)
+### Error Metrics (vs LaBSE)
+* STS-B MSE: 3.222 (-19.6%)
+* SICK MSE: 0.750 (-41.5%)
+## Installation & Usage
+Install package:
+```bash
+pip install sportsvision
+```
+Basic usage:
+```python
+import torch
+from sportsvision.research.configs import UnifiedEmbedderConfig
+from sportsvision.research.models import UnifiedEmbedderModel
+from transformers import AutoConfig, AutoModel
+from PIL import Image
+# Register the custom configuration and model
+AutoConfig.register("unified_embedder", UnifiedEmbedderConfig)
+AutoModel.register(UnifiedEmbedderConfig, UnifiedEmbedderModel)
+# Initialize the model from the pretrained repository
+emb_model = AutoModel.from_pretrained("sportsvision/omniemb-v1")
+# Determine the device
+device = "cuda" if torch.cuda.is_available() else "cpu"
+# Move the model to the device
+emb_model = emb_model.to(device)
+# Set the model to evaluation mode
+emb_model.eval()
+# Sample texts
+texts = [
+    "Playoff season is exciting!",
+    "Injury updates for the team."
+]
+# Encode texts to obtain embeddings
+text_embeddings = emb_model.encode_texts(texts)
+print("Text Embeddings:", text_embeddings)
+# Sample images
+image_paths = [
+    "path_to_image1.jpg",
+    "path_to_image2.jpg"
+]
+# Load images using PIL
+images = [Image.open(img_path).convert('RGB') for img_path in image_paths]
+# Encode images to obtain embeddings
+image_embeddings = emb_model.encode_images(images)
+print("Image Embeddings:", image_embeddings)
+```
+## Training
+* Fine-tuned CLIP architecture
+* Trained on VisRAG dataset using contrastive loss
+* Evaluation scripts and detailed methodology documentation coming soon
+## Limitations
+* Currently being benchmarked against ImageBind and other similar models
+* Working on model extensions
+## Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{kodathala2024omniemb,
+  author = {Kodathala, Varun},
+  title = {OmniEmb-v1: Multi-Modal Embeddings for Unified Retrieval},
+  year = {2024},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/sportsvision/omniemb-v1}}
+}
+```