---
license: apache-2.0
language:
- en
pipeline_tag: graph-ml
tags:
- gnn
- earth
- nasa
- 1.0.3
datasets:
- nasa-gesdisc/nasa-eo-knowledge-graph
metrics:
- accuracy
- f1
- roc_auc
base_model:
- nasa-impact/nasa-smd-ibm-st-v2
---
# EOSDIS Graph Neural Network Model Card

## Model Overview
**Model Name**: EOSDIS-GNN
**Version**: 1.0.3
**Type**: Heterogeneous Graph Neural Network
**Framework**: PyTorch + PyTorch Geometric
**Base Language Model**: nasa-impact/nasa-smd-ibm-st-v2


### Core Components
- **Base Text Encoder**: NASA-SMD-IBM Language Model (768-dimensional embeddings)
- **Graph Neural Network**: Heterogeneous GNN with multiple layers
- **Node Types**: Dataset, Publication, Instrument, Platform, ScienceKeyword
- **Edge Types**: Multiple relationship types between nodes

### Technical Specifications
- **Input Dimensions**: 768 (NASA-SMD-IBM embeddings)
- **Hidden Dimensions**: Configurable (default: 256)
- **Output Dimensions**: 768 (aligned with NASA-SMD-IBM space)
- **Number of Layers**: Configurable (default: 3)
- **Activation Function**: ReLU
- **Dropout**: Applied between layers

## Training Details

### Training Data
- **Source**: NASA EOSDIS Knowledge Graph
- **Node Types and Counts**:
  - Datasets: Earth science datasets from NASA DAACs
  - Publications: Related scientific papers
  - Instruments: Earth observation instruments
  - Platforms: Satellite and other observation platforms
  - Science Keywords: NASA Earth Science taxonomy

### Training Process
- **Optimization**: Adam optimizer
- **Loss Function**: Contrastive loss for semantic alignment
- **Training Strategy**: 
  - Initial node embedding generation
  - Message passing through graph structure
  - Contrastive learning with NASA-SMD-IBM embeddings

---

## Intended Use
**Designed for:** research, data discovery, and semantic search in Earth science  
**Not intended for:** safety‑critical systems or unrelated domains without fine‑tuning

---

### Strengths
1. **Semantic Understanding**:
   - Strong performance in finding semantically related content
   - Effective cross-modal relationships between text and graph structure

2. **Domain Specificity**:
   - Specialized for Earth science terminology
   - Understands relationships between instruments, platforms, and datasets

3. **Multi-modal Integration**:
   - Combines text-based and graph-based features
   - Preserves domain-specific relationships

### Limitations
1. **Data Coverage**:
   - Performance depends on training data coverage
   - May have gaps in newer or less documented areas

2. **Computational Requirements**:
   - Requires significant memory for full graph processing
   - Graph operations can be computationally intensive

3. **Domain Constraints**:
   - Optimized for Earth science domain
   - May not generalize well to other domains

## Usage Guide

### Installation Requirements
```bash
pip install torch torch-geometric transformers huggingface-hub
```

### Basic Usage
```python
from transformers import AutoTokenizer, AutoModel
import torch
from gnn_model import EOSDIS_GNN

# Load models
tokenizer = AutoTokenizer.from_pretrained("nasa-impact/nasa-smd-ibm-st-v2")
text_model = AutoModel.from_pretrained("nasa-impact/nasa-smd-ibm-st-v2")
gnn_model = EOSDIS_GNN.from_pretrained("your-username/eosdis-gnn")

# Process query
def get_embedding(text):
    inputs = tokenizer(text, return_tensors="pt", max_length=512, 
                      truncation=True, padding=True)
    with torch.no_grad():
        outputs = text_model(**inputs)
        return outputs.last_hidden_state[:, 0, :]
```

### Semantic Search Example
```python
from semantic_search import SemanticSearch

# Initialize searcher
searcher = SemanticSearch()

# Perform search
results = searcher.search(
    query="atmospheric carbon dioxide measurements",
    top_k=5,
    node_type="Dataset"  # Optional: filter by node type
)
```

## Evaluation Metrics

---

## Performance

| Metric | Value | Notes |
|--------|-------|-------|
| **Top‑5 Accuracy** | 87.4% | Probability that at least one of the top‑5 retrieved nodes is relevant. |
| **Mean Reciprocal Rank (MRR)** | 0.73 | Measures ranking quality. |
| **Link Prediction ROC‑AUC** | 0.91 | Ability to predict whether a given edge exists. |
| **Node Classification F1 (macro)** | 0.84 | Balanced accuracy across node types. |
| **Triple Classification Accuracy** | 88.6% | Accuracy in classifying valid vs. invalid triples. |

**Evaluation Notes:**  
- Dataset: held‑out portion of NASA EOSDIS Knowledge Graph  
- Search task: queries derived from publication abstracts  
- Link prediction: 80/10/10 train/val/test splits  
- Numbers from offline evaluation; may vary on different graph snapshots


### Version Control
- Model versions tracked on Hugging Face Hub
- Regular updates for improved performance


### Citation

```bibtex
@misc{armin_mehrabian_2025,
	author       = { Armin Mehrabian },
	title        = { nasa-eosdis-heterogeneous-gnn (Revision 7e71e62) },
	year         = 2025,
	url          = { https://huggingface.co/arminmehrabian/nasa-eosdis-heterogeneous-gnn },
	doi          = { 10.57967/hf/6071 },
	publisher    = { Hugging Face }
}
```

## Contact Information
- **Maintainer**: Armin Mehrabian
- **Email**: armin.mehrabian@nasa.gov
- **Organization**: NASA