--- license: apache-2.0 language: - en pipeline_tag: graph-ml tags: - gnn - earth - nasa - 1.0.3 datasets: - nasa-gesdisc/nasa-eo-knowledge-graph metrics: - accuracy - f1 - roc_auc base_model: - nasa-impact/nasa-smd-ibm-st-v2 --- # EOSDIS Graph Neural Network Model Card ## Model Overview **Model Name**: EOSDIS-GNN **Version**: 1.0.3 **Type**: Heterogeneous Graph Neural Network **Framework**: PyTorch + PyTorch Geometric **Base Language Model**: nasa-impact/nasa-smd-ibm-st-v2 ### Core Components - **Base Text Encoder**: NASA-SMD-IBM Language Model (768-dimensional embeddings) - **Graph Neural Network**: Heterogeneous GNN with multiple layers - **Node Types**: Dataset, Publication, Instrument, Platform, ScienceKeyword - **Edge Types**: Multiple relationship types between nodes ### Technical Specifications - **Input Dimensions**: 768 (NASA-SMD-IBM embeddings) - **Hidden Dimensions**: Configurable (default: 256) - **Output Dimensions**: 768 (aligned with NASA-SMD-IBM space) - **Number of Layers**: Configurable (default: 3) - **Activation Function**: ReLU - **Dropout**: Applied between layers ## Training Details ### Training Data - **Source**: NASA EOSDIS Knowledge Graph - **Node Types and Counts**: - Datasets: Earth science datasets from NASA DAACs - Publications: Related scientific papers - Instruments: Earth observation instruments - Platforms: Satellite and other observation platforms - Science Keywords: NASA Earth Science taxonomy ### Training Process - **Optimization**: Adam optimizer - **Loss Function**: Contrastive loss for semantic alignment - **Training Strategy**: - Initial node embedding generation - Message passing through graph structure - Contrastive learning with NASA-SMD-IBM embeddings --- ## Intended Use **Designed for:** research, data discovery, and semantic search in Earth science **Not intended for:** safety‑critical systems or unrelated domains without fine‑tuning --- ### Strengths 1. **Semantic Understanding**: - Strong performance in finding semantically related content - Effective cross-modal relationships between text and graph structure 2. **Domain Specificity**: - Specialized for Earth science terminology - Understands relationships between instruments, platforms, and datasets 3. **Multi-modal Integration**: - Combines text-based and graph-based features - Preserves domain-specific relationships ### Limitations 1. **Data Coverage**: - Performance depends on training data coverage - May have gaps in newer or less documented areas 2. **Computational Requirements**: - Requires significant memory for full graph processing - Graph operations can be computationally intensive 3. **Domain Constraints**: - Optimized for Earth science domain - May not generalize well to other domains ## Usage Guide ### Installation Requirements ```bash pip install torch torch-geometric transformers huggingface-hub ``` ### Basic Usage ```python from transformers import AutoTokenizer, AutoModel import torch from gnn_model import EOSDIS_GNN # Load models tokenizer = AutoTokenizer.from_pretrained("nasa-impact/nasa-smd-ibm-st-v2") text_model = AutoModel.from_pretrained("nasa-impact/nasa-smd-ibm-st-v2") gnn_model = EOSDIS_GNN.from_pretrained("your-username/eosdis-gnn") # Process query def get_embedding(text): inputs = tokenizer(text, return_tensors="pt", max_length=512, truncation=True, padding=True) with torch.no_grad(): outputs = text_model(**inputs) return outputs.last_hidden_state[:, 0, :] ``` ### Semantic Search Example ```python from semantic_search import SemanticSearch # Initialize searcher searcher = SemanticSearch() # Perform search results = searcher.search( query="atmospheric carbon dioxide measurements", top_k=5, node_type="Dataset" # Optional: filter by node type ) ``` ## Evaluation Metrics --- ## Performance | Metric | Value | Notes | |--------|-------|-------| | **Top‑5 Accuracy** | 87.4% | Probability that at least one of the top‑5 retrieved nodes is relevant. | | **Mean Reciprocal Rank (MRR)** | 0.73 | Measures ranking quality. | | **Link Prediction ROC‑AUC** | 0.91 | Ability to predict whether a given edge exists. | | **Node Classification F1 (macro)** | 0.84 | Balanced accuracy across node types. | | **Triple Classification Accuracy** | 88.6% | Accuracy in classifying valid vs. invalid triples. | **Evaluation Notes:** - Dataset: held‑out portion of NASA EOSDIS Knowledge Graph - Search task: queries derived from publication abstracts - Link prediction: 80/10/10 train/val/test splits - Numbers from offline evaluation; may vary on different graph snapshots ### Version Control - Model versions tracked on Hugging Face Hub - Regular updates for improved performance ### Citation ```bibtex @misc{armin_mehrabian_2025, author = { Armin Mehrabian }, title = { nasa-eosdis-heterogeneous-gnn (Revision 7e71e62) }, year = 2025, url = { https://huggingface.co/arminmehrabian/nasa-eosdis-heterogeneous-gnn }, doi = { 10.57967/hf/6071 }, publisher = { Hugging Face } } ``` ## Contact Information - **Maintainer**: Armin Mehrabian - **Email**: armin.mehrabian@nasa.gov - **Organization**: NASA