--- license: mit tags: - chemistry - molecular-property-prediction - graph-neural-networks - attentivefp - pytorch-geometric - toxicity-prediction language: - en pipeline_tag: text-classification --- # Pyrosage Endocrine_Disruption_NR-ER AttentiveFP Model ## Model Description This is an AttentiveFP (Attention-based Fingerprint) Graph Neural Network model trained to predict endocrine disruption via estrogen receptor (ER). Predicts whether a compound can bind to and activate estrogen receptors, affecting hormonal balance. The model takes SMILES strings as input and uses graph neural networks to predict molecular properties directly from the molecular structure. ## Model Details - **Model Type**: AttentiveFP (Graph Neural Network) - **Task**: Binary Classification - **Input**: SMILES strings (molecular representations) - **Output**: Binary classification (0/1) - **Framework**: PyTorch Geometric - **Architecture**: AttentiveFP with enhanced atom and bond features ### Hyperparameters ```json { "name": "deeper_model", "hidden_channels": 64, "num_layers": 4, "num_timesteps": 4, "dropout": 0.3, "learning_rate": 0.0005, "weight_decay": 0.0001, "batch_size": 32, "epochs": 50, "patience": 10 } ``` ## Usage ### Installation ```bash pip install torch torch-geometric rdkit-pypi ``` ### Loading the Model ```python import torch from torch_geometric.nn import AttentiveFP from rdkit import Chem from torch_geometric.data import Data # Load the model model_dict = torch.load('pytorch_model.pt', map_location='cpu') state_dict = model_dict['model_state_dict'] hyperparams = model_dict['hyperparameters'] # Create model with correct architecture model = AttentiveFP( in_channels=10, # Enhanced atom features hidden_channels=hyperparams["hidden_channels"], out_channels=1, edge_dim=6, # Enhanced bond features num_layers=hyperparams["num_layers"], num_timesteps=hyperparams["num_timesteps"], dropout=hyperparams["dropout"], ) model.load_state_dict(state_dict) model.eval() ``` ### Making Predictions ```python def smiles_to_data(smiles): """Convert SMILES string to PyG Data object""" mol = Chem.MolFromSmiles(smiles) if mol is None: return None # Enhanced atom features (10 dimensions) atom_features = [] for atom in mol.GetAtoms(): features = [ atom.GetAtomicNum(), atom.GetTotalDegree(), atom.GetFormalCharge(), atom.GetTotalNumHs(), atom.GetNumRadicalElectrons(), int(atom.GetIsAromatic()), int(atom.IsInRing()), # Hybridization as one-hot (3 dimensions) int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP), int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP2), int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP3) ] atom_features.append(features) x = torch.tensor(atom_features, dtype=torch.float) # Enhanced bond features (6 dimensions) edges_list = [] edge_features = [] for bond in mol.GetBonds(): i = bond.GetBeginAtomIdx() j = bond.GetEndAtomIdx() edges_list.extend([[i, j], [j, i]]) features = [ # Bond type as one-hot (4 dimensions) int(bond.GetBondType() == Chem.rdchem.BondType.SINGLE), int(bond.GetBondType() == Chem.rdchem.BondType.DOUBLE), int(bond.GetBondType() == Chem.rdchem.BondType.TRIPLE), int(bond.GetBondType() == Chem.rdchem.BondType.AROMATIC), # Additional features (2 dimensions) int(bond.GetIsConjugated()), int(bond.IsInRing()) ] edge_features.extend([features, features]) if not edges_list: return None edge_index = torch.tensor(edges_list, dtype=torch.long).t() edge_attr = torch.tensor(edge_features, dtype=torch.float) return Data(x=x, edge_index=edge_index, edge_attr=edge_attr) def predict(model, smiles): """Make prediction for a SMILES string""" data = smiles_to_data(smiles) if data is None: return None batch = torch.zeros(data.num_nodes, dtype=torch.long) with torch.no_grad(): output = model(data.x, data.edge_index, data.edge_attr, batch) return output.item() # Example usage smiles = "CC(=O)OC1=CC=CC=C1C(=O)O" # Aspirin prediction = predict(model, smiles) print(f"Prediction for {smiles}: {prediction}") ``` ## Training Data The model was trained on the Endocrine_Disruption_NR-ER dataset from the Pyrosage project, which focuses on molecular toxicity and environmental property prediction. ## Model Performance See training logs for detailed performance metrics. ## Limitations - The model is trained on specific chemical datasets and may not generalize to all molecular types - Performance may vary for molecules significantly different from the training distribution - Requires proper SMILES string format for input ## Citation If you use this model, please cite the Pyrosage project: ```bibtex @misc{pyrosageendocrine_disruption_nr-er, title={Pyrosage Endocrine_Disruption_NR-ER AttentiveFP Model}, author={UPCI NTUA}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/upci-ntua/pyrosage-endocrine_disruption_nr-er-attentivefp} } ``` ## License MIT License - see LICENSE file for details.