EOSDIS Graph Neural Network Model Card

Model Overview

Model Name: EOSDIS-GNN Version: 1.0.3 Type: Heterogeneous Graph Neural Network Framework: PyTorch + PyTorch Geometric Base Language Model: nasa-impact/nasa-smd-ibm-st-v2

Core Components

  • Base Text Encoder: NASA-SMD-IBM Language Model (768-dimensional embeddings)
  • Graph Neural Network: Heterogeneous GNN with multiple layers
  • Node Types: Dataset, Publication, Instrument, Platform, ScienceKeyword
  • Edge Types: Multiple relationship types between nodes

Technical Specifications

  • Input Dimensions: 768 (NASA-SMD-IBM embeddings)
  • Hidden Dimensions: Configurable (default: 256)
  • Output Dimensions: 768 (aligned with NASA-SMD-IBM space)
  • Number of Layers: Configurable (default: 3)
  • Activation Function: ReLU
  • Dropout: Applied between layers

Training Details

Training Data

  • Source: NASA EOSDIS Knowledge Graph
  • Node Types and Counts:
    • Datasets: Earth science datasets from NASA DAACs
    • Publications: Related scientific papers
    • Instruments: Earth observation instruments
    • Platforms: Satellite and other observation platforms
    • Science Keywords: NASA Earth Science taxonomy

Training Process

  • Optimization: Adam optimizer
  • Loss Function: Contrastive loss for semantic alignment
  • Training Strategy:
    • Initial node embedding generation
    • Message passing through graph structure
    • Contrastive learning with NASA-SMD-IBM embeddings

Intended Use

Designed for: research, data discovery, and semantic search in Earth science
Not intended for: safety‑critical systems or unrelated domains without fine‑tuning


Strengths

  1. Semantic Understanding:

    • Strong performance in finding semantically related content
    • Effective cross-modal relationships between text and graph structure
  2. Domain Specificity:

    • Specialized for Earth science terminology
    • Understands relationships between instruments, platforms, and datasets
  3. Multi-modal Integration:

    • Combines text-based and graph-based features
    • Preserves domain-specific relationships

Limitations

  1. Data Coverage:

    • Performance depends on training data coverage
    • May have gaps in newer or less documented areas
  2. Computational Requirements:

    • Requires significant memory for full graph processing
    • Graph operations can be computationally intensive
  3. Domain Constraints:

    • Optimized for Earth science domain
    • May not generalize well to other domains

Usage Guide

Installation Requirements

pip install torch torch-geometric transformers huggingface-hub

Basic Usage

from transformers import AutoTokenizer, AutoModel
import torch
from gnn_model import EOSDIS_GNN

# Load models
tokenizer = AutoTokenizer.from_pretrained("nasa-impact/nasa-smd-ibm-st-v2")
text_model = AutoModel.from_pretrained("nasa-impact/nasa-smd-ibm-st-v2")
gnn_model = EOSDIS_GNN.from_pretrained("your-username/eosdis-gnn")

# Process query
def get_embedding(text):
    inputs = tokenizer(text, return_tensors="pt", max_length=512, 
                      truncation=True, padding=True)
    with torch.no_grad():
        outputs = text_model(**inputs)
        return outputs.last_hidden_state[:, 0, :]

Semantic Search Example

from semantic_search import SemanticSearch

# Initialize searcher
searcher = SemanticSearch()

# Perform search
results = searcher.search(
    query="atmospheric carbon dioxide measurements",
    top_k=5,
    node_type="Dataset"  # Optional: filter by node type
)

Evaluation Metrics


Performance

Metric Value Notes
Top‑5 Accuracy 87.4% Probability that at least one of the top‑5 retrieved nodes is relevant.
Mean Reciprocal Rank (MRR) 0.73 Measures ranking quality.
Link Prediction ROC‑AUC 0.91 Ability to predict whether a given edge exists.
Node Classification F1 (macro) 0.84 Balanced accuracy across node types.
Triple Classification Accuracy 88.6% Accuracy in classifying valid vs. invalid triples.

Evaluation Notes:

  • Dataset: held‑out portion of NASA EOSDIS Knowledge Graph
  • Search task: queries derived from publication abstracts
  • Link prediction: 80/10/10 train/val/test splits
  • Numbers from offline evaluation; may vary on different graph snapshots

Version Control

  • Model versions tracked on Hugging Face Hub
  • Regular updates for improved performance

Citation

@misc{armin_mehrabian_2025,
    author       = { Armin Mehrabian },
    title        = { nasa-eosdis-heterogeneous-gnn (Revision 7e71e62) },
    year         = 2025,
    url          = { https://huggingface.co/arminmehrabian/nasa-eosdis-heterogeneous-gnn },
    doi          = { 10.57967/hf/6071 },
    publisher    = { Hugging Face }
}

Contact Information

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for arminmehrabian/nasa-eosdis-heterogeneous-gnn

Finetuned
(1)
this model

Dataset used to train arminmehrabian/nasa-eosdis-heterogeneous-gnn