--- license: mit tags: - chemistry - molecular-property-prediction - graph-neural-networks - attentivefp - pytorch-geometric - toxicity-prediction language: - en pipeline_tag: tabular-regression --- # Pyrosage TMP AttentiveFP Model ## Model Description This is an AttentiveFP (Attention-based Fingerprint) Graph Neural Network model trained to predict toxicity-related molecular property. This endpoint represents an experimental toxicity measurement. The model takes SMILES strings as input and uses graph neural networks to predict molecular properties directly from the molecular structure. ## Model Details - **Model Type**: AttentiveFP (Graph Neural Network) - **Task**: Regression - **Input**: SMILES strings (molecular representations) - **Output**: Continuous numerical value - **Framework**: PyTorch Geometric - **Architecture**: AttentiveFP with enhanced atom and bond features ### Hyperparameters ```json { "name": "larger_model", "hidden_channels": 128, "num_layers": 3, "num_timesteps": 3, "dropout": 0.1, "learning_rate": 0.0005, "weight_decay": 0.0001, "batch_size": 32, "epochs": 50, "patience": 10 } ``` ## Usage ### Installation ```bash pip install torch torch-geometric rdkit-pypi ``` ### Loading the Model ```python import torch from torch_geometric.nn import AttentiveFP from rdkit import Chem from torch_geometric.data import Data # Load the model model_dict = torch.load('pytorch_model.pt', map_location='cpu') state_dict = model_dict['model_state_dict'] hyperparams = model_dict['hyperparameters'] # Create model with correct architecture model = AttentiveFP( in_channels=10, # Enhanced atom features hidden_channels=hyperparams["hidden_channels"], out_channels=1, edge_dim=6, # Enhanced bond features num_layers=hyperparams["num_layers"], num_timesteps=hyperparams["num_timesteps"], dropout=hyperparams["dropout"], ) model.load_state_dict(state_dict) model.eval() ``` ### Making Predictions ```python def smiles_to_data(smiles): """Convert SMILES string to PyG Data object""" mol = Chem.MolFromSmiles(smiles) if mol is None: return None # Enhanced atom features (10 dimensions) atom_features = [] for atom in mol.GetAtoms(): features = [ atom.GetAtomicNum(), atom.GetTotalDegree(), atom.GetFormalCharge(), atom.GetTotalNumHs(), atom.GetNumRadicalElectrons(), int(atom.GetIsAromatic()), int(atom.IsInRing()), # Hybridization as one-hot (3 dimensions) int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP), int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP2), int(atom.GetHybridization() == Chem.rdchem.HybridizationType.SP3) ] atom_features.append(features) x = torch.tensor(atom_features, dtype=torch.float) # Enhanced bond features (6 dimensions) edges_list = [] edge_features = [] for bond in mol.GetBonds(): i = bond.GetBeginAtomIdx() j = bond.GetEndAtomIdx() edges_list.extend([[i, j], [j, i]]) features = [ # Bond type as one-hot (4 dimensions) int(bond.GetBondType() == Chem.rdchem.BondType.SINGLE), int(bond.GetBondType() == Chem.rdchem.BondType.DOUBLE), int(bond.GetBondType() == Chem.rdchem.BondType.TRIPLE), int(bond.GetBondType() == Chem.rdchem.BondType.AROMATIC), # Additional features (2 dimensions) int(bond.GetIsConjugated()), int(bond.IsInRing()) ] edge_features.extend([features, features]) if not edges_list: return None edge_index = torch.tensor(edges_list, dtype=torch.long).t() edge_attr = torch.tensor(edge_features, dtype=torch.float) return Data(x=x, edge_index=edge_index, edge_attr=edge_attr) def predict(model, smiles): """Make prediction for a SMILES string""" data = smiles_to_data(smiles) if data is None: return None batch = torch.zeros(data.num_nodes, dtype=torch.long) with torch.no_grad(): output = model(data.x, data.edge_index, data.edge_attr, batch) return output.item() # Example usage smiles = "CC(=O)OC1=CC=CC=C1C(=O)O" # Aspirin prediction = predict(model, smiles) print(f"Prediction for {smiles}: {prediction}") ``` ## Training Data The model was trained on the TMP dataset from the Pyrosage project, which focuses on molecular toxicity and environmental property prediction. ## Model Performance See training logs for detailed performance metrics. ## Limitations - The model is trained on specific chemical datasets and may not generalize to all molecular types - Performance may vary for molecules significantly different from the training distribution - Requires proper SMILES string format for input ## Citation If you use this model, please cite the Pyrosage project: ```bibtex @misc{pyrosagetmp, title={Pyrosage TMP AttentiveFP Model}, author={UPCI NTUA}, year={2025}, publisher={Hugging Face}, url={https://huggingface.co/upci-ntua/pyrosage-tmp-attentivefp} } ``` ## License MIT License - see LICENSE file for details.