File size: 5,292 Bytes
58b0da1
 
 
 
 
 
 
 
 
7e71e62
 
 
 
 
 
 
 
 
58b0da1
12f4aac
 
 
 
7e71e62
12f4aac
 
 
 
1a10727
12f4aac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1a10727
12f4aac
 
1a10727
12f4aac
 
 
 
 
 
 
 
 
7e71e62
 
 
 
 
 
 
12f4aac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1a10727
12f4aac
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7e71e62
12f4aac
7e71e62
12f4aac
7e71e62
 
 
 
 
 
 
12f4aac
7e71e62
 
 
 
 
12f4aac
 
 
 
 
 
 
 
 
 
811290f
 
 
 
 
 
 
12f4aac
 
 
 
58b0da1
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
---
license: apache-2.0
language:
- en
pipeline_tag: graph-ml
tags:
- gnn
- earth
- nasa
- 1.0.3
datasets:
- nasa-gesdisc/nasa-eo-knowledge-graph
metrics:
- accuracy
- f1
- roc_auc
base_model:
- nasa-impact/nasa-smd-ibm-st-v2
---
# EOSDIS Graph Neural Network Model Card

## Model Overview
**Model Name**: EOSDIS-GNN
**Version**: 1.0.3
**Type**: Heterogeneous Graph Neural Network
**Framework**: PyTorch + PyTorch Geometric
**Base Language Model**: nasa-impact/nasa-smd-ibm-st-v2



### Core Components
- **Base Text Encoder**: NASA-SMD-IBM Language Model (768-dimensional embeddings)
- **Graph Neural Network**: Heterogeneous GNN with multiple layers
- **Node Types**: Dataset, Publication, Instrument, Platform, ScienceKeyword
- **Edge Types**: Multiple relationship types between nodes

### Technical Specifications
- **Input Dimensions**: 768 (NASA-SMD-IBM embeddings)
- **Hidden Dimensions**: Configurable (default: 256)
- **Output Dimensions**: 768 (aligned with NASA-SMD-IBM space)
- **Number of Layers**: Configurable (default: 3)
- **Activation Function**: ReLU
- **Dropout**: Applied between layers

## Training Details

### Training Data
- **Source**: NASA EOSDIS Knowledge Graph
- **Node Types and Counts**:
  - Datasets: Earth science datasets from NASA DAACs
  - Publications: Related scientific papers
  - Instruments: Earth observation instruments
  - Platforms: Satellite and other observation platforms
  - Science Keywords: NASA Earth Science taxonomy

### Training Process
- **Optimization**: Adam optimizer
- **Loss Function**: Contrastive loss for semantic alignment
- **Training Strategy**: 
  - Initial node embedding generation
  - Message passing through graph structure
  - Contrastive learning with NASA-SMD-IBM embeddings

---

## Intended Use
**Designed for:** research, data discovery, and semantic search in Earth science  
**Not intended for:** safety‑critical systems or unrelated domains without fine‑tuning

---

### Strengths
1. **Semantic Understanding**:
   - Strong performance in finding semantically related content
   - Effective cross-modal relationships between text and graph structure

2. **Domain Specificity**:
   - Specialized for Earth science terminology
   - Understands relationships between instruments, platforms, and datasets

3. **Multi-modal Integration**:
   - Combines text-based and graph-based features
   - Preserves domain-specific relationships

### Limitations
1. **Data Coverage**:
   - Performance depends on training data coverage
   - May have gaps in newer or less documented areas

2. **Computational Requirements**:
   - Requires significant memory for full graph processing
   - Graph operations can be computationally intensive

3. **Domain Constraints**:
   - Optimized for Earth science domain
   - May not generalize well to other domains

## Usage Guide

### Installation Requirements
```bash
pip install torch torch-geometric transformers huggingface-hub
```

### Basic Usage
```python
from transformers import AutoTokenizer, AutoModel
import torch
from gnn_model import EOSDIS_GNN

# Load models
tokenizer = AutoTokenizer.from_pretrained("nasa-impact/nasa-smd-ibm-st-v2")
text_model = AutoModel.from_pretrained("nasa-impact/nasa-smd-ibm-st-v2")
gnn_model = EOSDIS_GNN.from_pretrained("your-username/eosdis-gnn")

# Process query
def get_embedding(text):
    inputs = tokenizer(text, return_tensors="pt", max_length=512, 
                      truncation=True, padding=True)
    with torch.no_grad():
        outputs = text_model(**inputs)
        return outputs.last_hidden_state[:, 0, :]
```

### Semantic Search Example
```python
from semantic_search import SemanticSearch

# Initialize searcher
searcher = SemanticSearch()

# Perform search
results = searcher.search(
    query="atmospheric carbon dioxide measurements",
    top_k=5,
    node_type="Dataset"  # Optional: filter by node type
)
```

## Evaluation Metrics

---

## Performance

| Metric | Value | Notes |
|--------|-------|-------|
| **Top‑5 Accuracy** | 87.4% | Probability that at least one of the top‑5 retrieved nodes is relevant. |
| **Mean Reciprocal Rank (MRR)** | 0.73 | Measures ranking quality. |
| **Link Prediction ROC‑AUC** | 0.91 | Ability to predict whether a given edge exists. |
| **Node Classification F1 (macro)** | 0.84 | Balanced accuracy across node types. |
| **Triple Classification Accuracy** | 88.6% | Accuracy in classifying valid vs. invalid triples. |

**Evaluation Notes:**  
- Dataset: held‑out portion of NASA EOSDIS Knowledge Graph  
- Search task: queries derived from publication abstracts  
- Link prediction: 80/10/10 train/val/test splits  
- Numbers from offline evaluation; may vary on different graph snapshots


### Version Control
- Model versions tracked on Hugging Face Hub
- Regular updates for improved performance


### Citation

```bibtex
@misc{armin_mehrabian_2025,
	author       = { Armin Mehrabian },
	title        = { nasa-eosdis-heterogeneous-gnn (Revision 7e71e62) },
	year         = 2025,
	url          = { https://huggingface.co/arminmehrabian/nasa-eosdis-heterogeneous-gnn },
	doi          = { 10.57967/hf/6071 },
	publisher    = { Hugging Face }
}
```

## Contact Information
- **Maintainer**: Armin Mehrabian
- **Email**: [email protected]
- **Organization**: NASA