metadata

license: mit
tags:
  - chemistry
  - molecular-property-prediction
  - graph-neural-networks
  - attentivefp
  - pytorch-geometric
  - toxicity-prediction
language:
  - en
pipeline_tag: tabular-regression

Pyrosage TMP AttentiveFP Model

Model Description

This is an AttentiveFP (Attention-based Fingerprint) Graph Neural Network model trained to predict toxicity-related molecular property. This endpoint represents an experimental toxicity measurement. The model takes SMILES strings as input and uses graph neural networks to predict molecular properties directly from the molecular structure.

Model Details

Model Type: AttentiveFP (Graph Neural Network)
Task: Regression
Input: SMILES strings (molecular representations)
Output: Continuous numerical value
Framework: PyTorch Geometric
Architecture: AttentiveFP with enhanced atom and bond features

Hyperparameters

{
  "name": "larger_model",
  "hidden_channels": 128,
  "num_layers": 3,
  "num_timesteps": 3,
  "dropout": 0.1,
  "learning_rate": 0.0005,
  "weight_decay": 0.0001,
  "batch_size": 32,
  "epochs": 50,
  "patience": 10
}

Usage

Installation

pip install torch torch-geometric rdkit-pypi

Loading the Model

import torch
from torch_geometric.nn import AttentiveFP
from rdkit import Chem
from torch_geometric.data import Data

# Load the model
model_dict = torch.load('pytorch_model.pt', map_location='cpu')
state_dict = model_dict['model_state_dict']
hyperparams = model_dict['hyperparameters']

# Create model with correct architecture
model = AttentiveFP(
    in_channels=10,  # Enhanced atom features
    hidden_channels=hyperparams["hidden_channels"],
    out_channels=1,
    edge_dim=6,  # Enhanced bond features
    num_layers=hyperparams["num_layers"],
    num_timesteps=hyperparams["num_timesteps"],
    dropout=hyperparams["dropout"],
)

model.load_state_dict(state_dict)
model.eval()

Making Predictions

def smiles_to_data(smiles):
    """Convert SMILES string to PyG Data object"""
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        return None

    # Enhanced atom features (10 dimensions)
    atom_features = []
    for atom in mol.GetAtoms():
        features = [
            atom.GetAtomicNum(),
            atom.GetTotalDegree(),
            atom.GetFormalCharge(),
            atom.GetTotalNumHs(),
            atom.GetNumRadicalElectrons(),
            int(atom.GetIsAromatic()),
            int(atom.IsInRing()),
            # Hybridization as one-hot (3 dimensions)
            int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP),
            int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP2),
            int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP3)
        ]
        atom_features.append(features)

    x = torch.tensor(atom_features, dtype=torch.float)

    # Enhanced bond features (6 dimensions)
    edges_list = []
    edge_features = []
    for bond in mol.GetBonds():
        i = bond.GetBeginAtomIdx()
        j = bond.GetEndAtomIdx()
        edges_list.extend([[i, j], [j, i]])

        features = [
            # Bond type as one-hot (4 dimensions)
            int(bond.GetBondType() == Chem.rdchem.BondType.SINGLE),
            int(bond.GetBondType() == Chem.rdchem.BondType.DOUBLE),
            int(bond.GetBondType() == Chem.rdchem.BondType.TRIPLE),
            int(bond.GetBondType() == Chem.rdchem.BondType.AROMATIC),
            # Additional features (2 dimensions)
            int(bond.GetIsConjugated()),
            int(bond.IsInRing())
        ]
        edge_features.extend([features, features])

    if not edges_list:
        return None

    edge_index = torch.tensor(edges_list, dtype=torch.long).t()
    edge_attr = torch.tensor(edge_features, dtype=torch.float)

    return Data(x=x, edge_index=edge_index, edge_attr=edge_attr)

def predict(model, smiles):
    """Make prediction for a SMILES string"""
    data = smiles_to_data(smiles)
    if data is None:
        return None
    
    batch = torch.zeros(data.num_nodes, dtype=torch.long)
    with torch.no_grad():
        output = model(data.x, data.edge_index, data.edge_attr, batch)
        return output.item()

# Example usage
smiles = "CC(=O)OC1=CC=CC=C1C(=O)O"  # Aspirin
prediction = predict(model, smiles)
print(f"Prediction for {smiles}: {prediction}")

Training Data

The model was trained on the TMP dataset from the Pyrosage project, which focuses on molecular toxicity and environmental property prediction.

Model Performance

See training logs for detailed performance metrics.

Limitations

The model is trained on specific chemical datasets and may not generalize to all molecular types
Performance may vary for molecules significantly different from the training distribution
Requires proper SMILES string format for input

Citation

If you use this model, please cite the Pyrosage project:

@misc{pyrosagetmp,
  title={Pyrosage TMP AttentiveFP Model},
  author={UPCI NTUA},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/upci-ntua/pyrosage-tmp-attentivefp}
}

License

MIT License - see LICENSE file for details.