Pyrosage Endocrine_Disruption_NR-ER AttentiveFP Model

Model Description

This is an AttentiveFP (Attention-based Fingerprint) Graph Neural Network model trained to predict endocrine disruption via estrogen receptor (ER). Predicts whether a compound can bind to and activate estrogen receptors, affecting hormonal balance. The model takes SMILES strings as input and uses graph neural networks to predict molecular properties directly from the molecular structure.

Model Details

  • Model Type: AttentiveFP (Graph Neural Network)
  • Task: Binary Classification
  • Input: SMILES strings (molecular representations)
  • Output: Binary classification (0/1)
  • Framework: PyTorch Geometric
  • Architecture: AttentiveFP with enhanced atom and bond features

Hyperparameters

{
  "name": "deeper_model",
  "hidden_channels": 64,
  "num_layers": 4,
  "num_timesteps": 4,
  "dropout": 0.3,
  "learning_rate": 0.0005,
  "weight_decay": 0.0001,
  "batch_size": 32,
  "epochs": 50,
  "patience": 10
}

Usage

Installation

pip install torch torch-geometric rdkit-pypi

Loading the Model

import torch
from torch_geometric.nn import AttentiveFP
from rdkit import Chem
from torch_geometric.data import Data

# Load the model
model_dict = torch.load('pytorch_model.pt', map_location='cpu')
state_dict = model_dict['model_state_dict']
hyperparams = model_dict['hyperparameters']

# Create model with correct architecture
model = AttentiveFP(
    in_channels=10,  # Enhanced atom features
    hidden_channels=hyperparams["hidden_channels"],
    out_channels=1,
    edge_dim=6,  # Enhanced bond features
    num_layers=hyperparams["num_layers"],
    num_timesteps=hyperparams["num_timesteps"],
    dropout=hyperparams["dropout"],
)

model.load_state_dict(state_dict)
model.eval()

Making Predictions

def smiles_to_data(smiles):
    """Convert SMILES string to PyG Data object"""
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        return None

    # Enhanced atom features (10 dimensions)
    atom_features = []
    for atom in mol.GetAtoms():
        features = [
            atom.GetAtomicNum(),
            atom.GetTotalDegree(),
            atom.GetFormalCharge(),
            atom.GetTotalNumHs(),
            atom.GetNumRadicalElectrons(),
            int(atom.GetIsAromatic()),
            int(atom.IsInRing()),
            # Hybridization as one-hot (3 dimensions)
            int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP),
            int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP2),
            int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP3)
        ]
        atom_features.append(features)

    x = torch.tensor(atom_features, dtype=torch.float)

    # Enhanced bond features (6 dimensions)
    edges_list = []
    edge_features = []
    for bond in mol.GetBonds():
        i = bond.GetBeginAtomIdx()
        j = bond.GetEndAtomIdx()
        edges_list.extend([[i, j], [j, i]])

        features = [
            # Bond type as one-hot (4 dimensions)
            int(bond.GetBondType() == Chem.rdchem.BondType.SINGLE),
            int(bond.GetBondType() == Chem.rdchem.BondType.DOUBLE),
            int(bond.GetBondType() == Chem.rdchem.BondType.TRIPLE),
            int(bond.GetBondType() == Chem.rdchem.BondType.AROMATIC),
            # Additional features (2 dimensions)
            int(bond.GetIsConjugated()),
            int(bond.IsInRing())
        ]
        edge_features.extend([features, features])

    if not edges_list:
        return None

    edge_index = torch.tensor(edges_list, dtype=torch.long).t()
    edge_attr = torch.tensor(edge_features, dtype=torch.float)

    return Data(x=x, edge_index=edge_index, edge_attr=edge_attr)

def predict(model, smiles):
    """Make prediction for a SMILES string"""
    data = smiles_to_data(smiles)
    if data is None:
        return None
    
    batch = torch.zeros(data.num_nodes, dtype=torch.long)
    with torch.no_grad():
        output = model(data.x, data.edge_index, data.edge_attr, batch)
        return output.item()

# Example usage
smiles = "CC(=O)OC1=CC=CC=C1C(=O)O"  # Aspirin
prediction = predict(model, smiles)
print(f"Prediction for {smiles}: {prediction}")

Training Data

The model was trained on the Endocrine_Disruption_NR-ER dataset from the Pyrosage project, which focuses on molecular toxicity and environmental property prediction.

Model Performance

See training logs for detailed performance metrics.

Limitations

  • The model is trained on specific chemical datasets and may not generalize to all molecular types
  • Performance may vary for molecules significantly different from the training distribution
  • Requires proper SMILES string format for input

Citation

If you use this model, please cite the Pyrosage project:

@misc{pyrosageendocrine_disruption_nr-er,
  title={Pyrosage Endocrine_Disruption_NR-ER AttentiveFP Model},
  author={UPCI NTUA},
  year={2025},
  publisher={Hugging Face},
  url={https://huggingface.co/upci-ntua/pyrosage-endocrine_disruption_nr-er-attentivefp}
}

License

MIT License - see LICENSE file for details.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support