MoML-CA: Molecular Machine Learning for Coarse-grained Applications

This repository contains the DJMGNN (Dense Jump Multi-Graph Neural Network) models from the MoML-CA project, designed for molecular property prediction and coarse-grained molecular modeling applications.

🚀 Models Available

1. Base Model (`base_model/`)

Pre-trained DJMGNN model trained on multiple molecular datasets
Datasets: QM9, SPICE, PFAS
Task: General molecular property prediction
Use case: Starting point for transfer learning or direct molecular property prediction

2. Fine-tuned Model (`finetuned_model/`)

PFAS-specialized DJMGNN model fine-tuned for PFAS molecular properties
Base: Built upon the base model
Specialization: Per- and polyfluoroalkyl substances (PFAS)
Use case: Optimized for PFAS molecular property prediction

🏗️ Architecture

DJMGNN (Dense Jump Multi-Graph Neural Network) features:

Multi-task learning: Simultaneous node-level and graph-level predictions
Jump connections: Enhanced information flow between layers
Dense blocks: Improved gradient flow and feature reuse
Supernode aggregation: Global graph representation
RBF features: Radial basis function encoding for distance information

Architecture Details

Hidden Dimensions: 128
Number of Blocks: 3-4
Layers per Block: 6
Input Node Dimensions: 11-29 (depending on featurization)
Node Output Dimensions: 3 (forces/properties per atom)
Graph Output Dimensions: 19 (molecular descriptors)
Energy Output Dimensions: 1 (total energy)

📊 Training Details

Datasets

QM9: ~130k small organic molecules with quantum mechanical properties
SPICE: Molecular dynamics trajectories with forces and energies
PFAS: Per- and polyfluoroalkyl substances dataset with specialized descriptors

Training Configuration

Optimizer: Adam
Learning Rate: 3e-5 (fine-tuning), 1e-3 (base training)
Batch Size: 4-8 (node tasks), 8-32 (graph tasks)
Loss Functions: MSE for regression, weighted multi-task loss
Regularization: Dropout (0.2), gradient clipping

🔧 Usage

Loading the Base Model

import torch
from moml.models.mgnn.djmgnn import DJMGNN

# Initialize model architecture
model = DJMGNN(
    in_node_dim=29,  # Adjust based on your featurization
    in_edge_dim=0,
    hidden_dim=128,
    n_blocks=4,
    layers_per_block=6,
    node_output_dims=3,
    graph_output_dims=19,
    energy_output_dims=1,
    jk_mode="attention",
    dropout=0.2,
    use_supernode=True,
    use_rbf=True,
    rbf_K=32
)

# Load base model checkpoint
checkpoint = torch.hub.load_state_dict_from_url(
    "https://huggingface.co/saketh11/MoML-CA/resolve/main/base_model/pytorch_model.pt"
)
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()

Loading the Fine-tuned Model

# Same architecture setup as above, then:
checkpoint = torch.hub.load_state_dict_from_url(
    "https://huggingface.co/saketh11/MoML-CA/resolve/main/finetuned_model/pytorch_model.pt"
)
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()

Making Predictions

# Assuming you have a molecular graph 'data' (torch_geometric.data.Data)
with torch.no_grad():
    output = model(
        x=data.x,
        edge_index=data.edge_index,
        edge_attr=data.edge_attr,
        batch=data.batch
    )
    
    # Extract predictions
    node_predictions = output["node_pred"]      # Per-atom properties/forces
    graph_predictions = output["graph_pred"]    # Molecular descriptors
    energy_predictions = output["energy_pred"]  # Total energy

📈 Performance

Base Model

Trained on diverse molecular datasets for robust generalization
Multi-task learning across node and graph-level properties
Suitable for transfer learning to specialized domains

Fine-tuned Model

Specialized for PFAS molecular properties
Improved accuracy on fluorinated compounds
Optimized for environmental and toxicological applications

🔬 Applications

Molecular Property Prediction: HOMO/LUMO, dipole moments, polarizability
Force Field Development: Atomic forces and energies for MD simulations
Environmental Chemistry: PFAS behavior and properties
Drug Discovery: Molecular screening and optimization
Materials Science: Polymer and surface properties

📚 Citation

If you use these models in your research, please cite:

@misc{moml_ca_djmgnn,
  title={MoML-CA: Molecular Machine Learning for Coarse-grained Applications},
  author={Saketh Bharadwaj},
  year={2024},
  url={https://github.com/SAKETH11111/MoML-CA},
  note={Hugging Face Model Hub: https://huggingface.co/saketh11/MoML-CA}
}

🔗 Links

GitHub Repository: SAKETH11111/MoML-CA
Documentation: See repository README and docs/
Issues: Report bugs and request features on GitHub

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

👥 Contributing

Contributions are welcome! Please see the contributing guidelines in the GitHub repository.

For questions or support, please open an issue in the GitHub repository.