VarunKodathala commited on
Commit
e563798
·
verified ·
1 Parent(s): 2d734d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +109 -3
README.md CHANGED
@@ -1,3 +1,109 @@
1
- ---
2
- license: openrail
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: openrail
3
+ ---
4
+
5
+ # OmniEmb-v1: Multi-Modal Embeddings for Unified Retrieval
6
+
7
+ A compact multi-modal embedding model that creates unified embeddings for text and images, enabling efficient retrieval across modalities without intermediate VLM transformations.
8
+
9
+ ## Features
10
+
11
+ * 1536d unified embedding space
12
+ * Text2Text, Text2Image, and Image2Image retrieval support
13
+ * Direct embedding without VLM conversion steps
14
+ * Layout preservation for image data
15
+
16
+ ## Performance
17
+
18
+ ### Cross-Modal Retrieval (vs CLIP-ViT-B/32)
19
+ * Hits@1: 0.428 (+60.8%)
20
+ * Hits@5: 0.651 (+38.9%)
21
+
22
+ ### Correlation Metrics (vs LaBSE)
23
+ * STS-B Pearson: 0.800 (+9.7%)
24
+ * STS-B Spearman: 0.795 (+7.3%)
25
+ * SICK Pearson: 0.782 (+6.3%)
26
+
27
+ ### Error Metrics (vs LaBSE)
28
+ * STS-B MSE: 3.222 (-19.6%)
29
+ * SICK MSE: 0.750 (-41.5%)
30
+
31
+ ## Installation & Usage
32
+
33
+ Install package:
34
+ ```bash
35
+ pip install sportsvision
36
+ ```
37
+
38
+ Basic usage:
39
+ ```python
40
+ import torch
41
+ from sportsvision.research.configs import UnifiedEmbedderConfig
42
+ from sportsvision.research.models import UnifiedEmbedderModel
43
+ from transformers import AutoConfig, AutoModel
44
+ from PIL import Image
45
+
46
+ # Register the custom configuration and model
47
+ AutoConfig.register("unified_embedder", UnifiedEmbedderConfig)
48
+ AutoModel.register(UnifiedEmbedderConfig, UnifiedEmbedderModel)
49
+
50
+ # Initialize the model from the pretrained repository
51
+ emb_model = AutoModel.from_pretrained("sportsvision/omniemb-v1")
52
+
53
+ # Determine the device
54
+ device = "cuda" if torch.cuda.is_available() else "cpu"
55
+
56
+ # Move the model to the device
57
+ emb_model = emb_model.to(device)
58
+
59
+ # Set the model to evaluation mode
60
+ emb_model.eval()
61
+
62
+ # Sample texts
63
+ texts = [
64
+ "Playoff season is exciting!",
65
+ "Injury updates for the team."
66
+ ]
67
+
68
+ # Encode texts to obtain embeddings
69
+ text_embeddings = emb_model.encode_texts(texts)
70
+ print("Text Embeddings:", text_embeddings)
71
+
72
+ # Sample images
73
+ image_paths = [
74
+ "path_to_image1.jpg",
75
+ "path_to_image2.jpg"
76
+ ]
77
+
78
+ # Load images using PIL
79
+ images = [Image.open(img_path).convert('RGB') for img_path in image_paths]
80
+
81
+ # Encode images to obtain embeddings
82
+ image_embeddings = emb_model.encode_images(images)
83
+ print("Image Embeddings:", image_embeddings)
84
+ ```
85
+
86
+ ## Training
87
+
88
+ * Fine-tuned CLIP architecture
89
+ * Trained on VisRAG dataset using contrastive loss
90
+ * Evaluation scripts and detailed methodology documentation coming soon
91
+
92
+ ## Limitations
93
+
94
+ * Currently being benchmarked against ImageBind and other similar models
95
+ * Working on model extensions
96
+
97
+ ## Citation
98
+
99
+ If you use this model in your research, please cite:
100
+
101
+ ```bibtex
102
+ @misc{kodathala2024omniemb,
103
+ author = {Kodathala, Varun},
104
+ title = {OmniEmb-v1: Multi-Modal Embeddings for Unified Retrieval},
105
+ year = {2024},
106
+ publisher = {Hugging Face},
107
+ howpublished = {\url{https://huggingface.co/sportsvision/omniemb-v1}}
108
+ }
109
+ ```