MoML-CA: Molecular Machine Learning for Coarse-grained Applications

This repository contains the DJMGNN (Dense Jump Multi-Graph Neural Network) models from the MoML-CA project, designed for molecular property prediction and coarse-grained molecular modeling applications.

πŸš€ Models Available

1. Base Model (base_model/)

  • Pre-trained DJMGNN model trained on multiple molecular datasets
  • Datasets: QM9, SPICE, PFAS
  • Task: General molecular property prediction
  • Use case: Starting point for transfer learning or direct molecular property prediction

2. Fine-tuned Model (finetuned_model/)

  • PFAS-specialized DJMGNN model fine-tuned for PFAS molecular properties
  • Base: Built upon the base model
  • Specialization: Per- and polyfluoroalkyl substances (PFAS)
  • Use case: Optimized for PFAS molecular property prediction

πŸ—οΈ Architecture

DJMGNN (Dense Jump Multi-Graph Neural Network) features:

  • Multi-task learning: Simultaneous node-level and graph-level predictions
  • Jump connections: Enhanced information flow between layers
  • Dense blocks: Improved gradient flow and feature reuse
  • Supernode aggregation: Global graph representation
  • RBF features: Radial basis function encoding for distance information

Architecture Details

  • Hidden Dimensions: 128
  • Number of Blocks: 3-4
  • Layers per Block: 6
  • Input Node Dimensions: 11-29 (depending on featurization)
  • Node Output Dimensions: 3 (forces/properties per atom)
  • Graph Output Dimensions: 19 (molecular descriptors)
  • Energy Output Dimensions: 1 (total energy)

πŸ“Š Training Details

Datasets

  • QM9: ~130k small organic molecules with quantum mechanical properties
  • SPICE: Molecular dynamics trajectories with forces and energies
  • PFAS: Per- and polyfluoroalkyl substances dataset with specialized descriptors

Training Configuration

  • Optimizer: Adam
  • Learning Rate: 3e-5 (fine-tuning), 1e-3 (base training)
  • Batch Size: 4-8 (node tasks), 8-32 (graph tasks)
  • Loss Functions: MSE for regression, weighted multi-task loss
  • Regularization: Dropout (0.2), gradient clipping

πŸ”§ Usage

Loading the Base Model

import torch
from moml.models.mgnn.djmgnn import DJMGNN

# Initialize model architecture
model = DJMGNN(
    in_node_dim=29,  # Adjust based on your featurization
    in_edge_dim=0,
    hidden_dim=128,
    n_blocks=4,
    layers_per_block=6,
    node_output_dims=3,
    graph_output_dims=19,
    energy_output_dims=1,
    jk_mode="attention",
    dropout=0.2,
    use_supernode=True,
    use_rbf=True,
    rbf_K=32
)

# Load base model checkpoint
checkpoint = torch.hub.load_state_dict_from_url(
    "https://huggingface.co/saketh11/MoML-CA/resolve/main/base_model/pytorch_model.pt"
)
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()

Loading the Fine-tuned Model

# Same architecture setup as above, then:
checkpoint = torch.hub.load_state_dict_from_url(
    "https://huggingface.co/saketh11/MoML-CA/resolve/main/finetuned_model/pytorch_model.pt"
)
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()

Making Predictions

# Assuming you have a molecular graph 'data' (torch_geometric.data.Data)
with torch.no_grad():
    output = model(
        x=data.x,
        edge_index=data.edge_index,
        edge_attr=data.edge_attr,
        batch=data.batch
    )
    
    # Extract predictions
    node_predictions = output["node_pred"]      # Per-atom properties/forces
    graph_predictions = output["graph_pred"]    # Molecular descriptors
    energy_predictions = output["energy_pred"]  # Total energy

πŸ“ˆ Performance

Base Model

  • Trained on diverse molecular datasets for robust generalization
  • Multi-task learning across node and graph-level properties
  • Suitable for transfer learning to specialized domains

Fine-tuned Model

  • Specialized for PFAS molecular properties
  • Improved accuracy on fluorinated compounds
  • Optimized for environmental and toxicological applications

πŸ”¬ Applications

  • Molecular Property Prediction: HOMO/LUMO, dipole moments, polarizability
  • Force Field Development: Atomic forces and energies for MD simulations
  • Environmental Chemistry: PFAS behavior and properties
  • Drug Discovery: Molecular screening and optimization
  • Materials Science: Polymer and surface properties

πŸ“š Citation

If you use these models in your research, please cite:

@misc{moml_ca_djmgnn,
  title={MoML-CA: Molecular Machine Learning for Coarse-grained Applications},
  author={Saketh Bharadwaj},
  year={2024},
  url={https://github.com/SAKETH11111/MoML-CA},
  note={Hugging Face Model Hub: https://huggingface.co/saketh11/MoML-CA}
}

πŸ”— Links

  • GitHub Repository: SAKETH11111/MoML-CA
  • Documentation: See repository README and docs/
  • Issues: Report bugs and request features on GitHub

πŸ“„ License

This project is licensed under the MIT License. See the LICENSE file for details.

πŸ‘₯ Contributing

Contributions are welcome! Please see the contributing guidelines in the GitHub repository.


For questions or support, please open an issue in the GitHub repository.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support