File size: 5,292 Bytes
58b0da1 7e71e62 58b0da1 12f4aac 7e71e62 12f4aac 1a10727 12f4aac 1a10727 12f4aac 1a10727 12f4aac 7e71e62 12f4aac 1a10727 12f4aac 7e71e62 12f4aac 7e71e62 12f4aac 7e71e62 12f4aac 7e71e62 12f4aac 811290f 12f4aac 58b0da1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
---
license: apache-2.0
language:
- en
pipeline_tag: graph-ml
tags:
- gnn
- earth
- nasa
- 1.0.3
datasets:
- nasa-gesdisc/nasa-eo-knowledge-graph
metrics:
- accuracy
- f1
- roc_auc
base_model:
- nasa-impact/nasa-smd-ibm-st-v2
---
# EOSDIS Graph Neural Network Model Card
## Model Overview
**Model Name**: EOSDIS-GNN
**Version**: 1.0.3
**Type**: Heterogeneous Graph Neural Network
**Framework**: PyTorch + PyTorch Geometric
**Base Language Model**: nasa-impact/nasa-smd-ibm-st-v2
### Core Components
- **Base Text Encoder**: NASA-SMD-IBM Language Model (768-dimensional embeddings)
- **Graph Neural Network**: Heterogeneous GNN with multiple layers
- **Node Types**: Dataset, Publication, Instrument, Platform, ScienceKeyword
- **Edge Types**: Multiple relationship types between nodes
### Technical Specifications
- **Input Dimensions**: 768 (NASA-SMD-IBM embeddings)
- **Hidden Dimensions**: Configurable (default: 256)
- **Output Dimensions**: 768 (aligned with NASA-SMD-IBM space)
- **Number of Layers**: Configurable (default: 3)
- **Activation Function**: ReLU
- **Dropout**: Applied between layers
## Training Details
### Training Data
- **Source**: NASA EOSDIS Knowledge Graph
- **Node Types and Counts**:
- Datasets: Earth science datasets from NASA DAACs
- Publications: Related scientific papers
- Instruments: Earth observation instruments
- Platforms: Satellite and other observation platforms
- Science Keywords: NASA Earth Science taxonomy
### Training Process
- **Optimization**: Adam optimizer
- **Loss Function**: Contrastive loss for semantic alignment
- **Training Strategy**:
- Initial node embedding generation
- Message passing through graph structure
- Contrastive learning with NASA-SMD-IBM embeddings
---
## Intended Use
**Designed for:** research, data discovery, and semantic search in Earth science
**Not intended for:** safety‑critical systems or unrelated domains without fine‑tuning
---
### Strengths
1. **Semantic Understanding**:
- Strong performance in finding semantically related content
- Effective cross-modal relationships between text and graph structure
2. **Domain Specificity**:
- Specialized for Earth science terminology
- Understands relationships between instruments, platforms, and datasets
3. **Multi-modal Integration**:
- Combines text-based and graph-based features
- Preserves domain-specific relationships
### Limitations
1. **Data Coverage**:
- Performance depends on training data coverage
- May have gaps in newer or less documented areas
2. **Computational Requirements**:
- Requires significant memory for full graph processing
- Graph operations can be computationally intensive
3. **Domain Constraints**:
- Optimized for Earth science domain
- May not generalize well to other domains
## Usage Guide
### Installation Requirements
```bash
pip install torch torch-geometric transformers huggingface-hub
```
### Basic Usage
```python
from transformers import AutoTokenizer, AutoModel
import torch
from gnn_model import EOSDIS_GNN
# Load models
tokenizer = AutoTokenizer.from_pretrained("nasa-impact/nasa-smd-ibm-st-v2")
text_model = AutoModel.from_pretrained("nasa-impact/nasa-smd-ibm-st-v2")
gnn_model = EOSDIS_GNN.from_pretrained("your-username/eosdis-gnn")
# Process query
def get_embedding(text):
inputs = tokenizer(text, return_tensors="pt", max_length=512,
truncation=True, padding=True)
with torch.no_grad():
outputs = text_model(**inputs)
return outputs.last_hidden_state[:, 0, :]
```
### Semantic Search Example
```python
from semantic_search import SemanticSearch
# Initialize searcher
searcher = SemanticSearch()
# Perform search
results = searcher.search(
query="atmospheric carbon dioxide measurements",
top_k=5,
node_type="Dataset" # Optional: filter by node type
)
```
## Evaluation Metrics
---
## Performance
| Metric | Value | Notes |
|--------|-------|-------|
| **Top‑5 Accuracy** | 87.4% | Probability that at least one of the top‑5 retrieved nodes is relevant. |
| **Mean Reciprocal Rank (MRR)** | 0.73 | Measures ranking quality. |
| **Link Prediction ROC‑AUC** | 0.91 | Ability to predict whether a given edge exists. |
| **Node Classification F1 (macro)** | 0.84 | Balanced accuracy across node types. |
| **Triple Classification Accuracy** | 88.6% | Accuracy in classifying valid vs. invalid triples. |
**Evaluation Notes:**
- Dataset: held‑out portion of NASA EOSDIS Knowledge Graph
- Search task: queries derived from publication abstracts
- Link prediction: 80/10/10 train/val/test splits
- Numbers from offline evaluation; may vary on different graph snapshots
### Version Control
- Model versions tracked on Hugging Face Hub
- Regular updates for improved performance
### Citation
```bibtex
@misc{armin_mehrabian_2025,
author = { Armin Mehrabian },
title = { nasa-eosdis-heterogeneous-gnn (Revision 7e71e62) },
year = 2025,
url = { https://huggingface.co/arminmehrabian/nasa-eosdis-heterogeneous-gnn },
doi = { 10.57967/hf/6071 },
publisher = { Hugging Face }
}
```
## Contact Information
- **Maintainer**: Armin Mehrabian
- **Email**: [email protected]
- **Organization**: NASA |