π€ PhytoAI Discovery Models
Overview
Pre-trained machine learning models for therapeutic compound discovery and bioactivity prediction, trained on the PhytoAI MEGA Dataset containing 1.4M molecules.
π― Available Models
Bioactivity Prediction
bioactivity_classifier
: Multi-class therapeutic activity predictionpotency_regressor
: Continuous bioactivity score predictionsafety_classifier
: Toxicity and safety assessment
Molecular Property Prediction
admet_predictor
: ADMET properties (Absorption, Distribution, Metabolism, Excretion, Toxicity)lipinski_classifier
: Drug-likeness assessmentsolubility_regressor
: Aqueous solubility prediction
Target Prediction
target_interaction
: Protein-target interaction predictionmulti_target_classifier
: Multi-target activity predictionmechanism_predictor
: Mechanism of action classification
π» Usage
Quick Start
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load bioactivity prediction model
model_name = "Gatescrispy/phytoai-discovery-models"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Predict bioactivity for a SMILES string
smiles = "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O" # Ibuprofen example
inputs = tokenizer(smiles, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(f"Predicted bioactivity score: {predictions.max().item():.3f}")
Advanced Usage
from phytoai_models import PhytoAIPredictor
# Initialize comprehensive predictor
predictor = PhytoAIPredictor()
# Analyze a compound
compound_data = predictor.analyze_compound(
smiles="CC(C)CC1=CC=C(C=C1)C(C)C(=O)O",
include_properties=True,
include_targets=True,
include_safety=True
)
print("Compound Analysis:")
print(f"- Bioactivity Score: {compound_data['bioactivity_score']:.3f}")
print(f"- Safety Index: {compound_data['safety_index']:.3f}")
print(f"- Predicted Targets: {compound_data['targets']}")
print(f"- Mechanism: {compound_data['mechanism']}")
π Model Performance
Bioactivity Prediction
- Accuracy: 87.3% (test set)
- AUC-ROC: 0.891
- F1-Score: 0.846
- Precision: 0.852
- Recall: 0.841
Molecular Property Prediction
- ADMET RΒ²: 0.78 (vs 0.65 baseline)
- Solubility MAE: 0.43 log units
- LogP MAE: 0.31 units
- Bioavailability AUC: 0.83
Target Prediction
- Target Interaction AUC: 0.912
- Multi-target F1: 0.789
- Mechanism Accuracy: 74.2%
π¬ Training Details
Dataset
- Training Set: 1.1M molecules (PhytoAI MEGA Dataset)
- Validation Set: 150K molecules
- Test Set: 150K molecules
- Features: SMILES, molecular descriptors, traditional use data
Architecture
- Base Model: ChemBERTa-v2 (10M parameters)
- Fine-tuning: Task-specific heads
- Optimization: AdamW with learning rate scheduling
- Regularization: Dropout (0.1), weight decay (0.01)
Training Infrastructure
- GPUs: 4x NVIDIA A100 (40GB)
- Training Time: 72 hours total
- Framework: PyTorch + Transformers
- Distributed: DeepSpeed ZeRO-2
π― Applications
Drug Discovery
# Screen compound library
candidates = predictor.screen_library(
smiles_list=compound_library,
target_activity="anti-inflammatory",
min_bioactivity=0.7,
max_toxicity=0.3
)
Research Acceleration
# Prioritize compounds for synthesis
priorities = predictor.prioritize_synthesis(
candidates=candidate_molecules,
criteria=["novelty", "bioactivity", "druggability"],
budget_constraints=True
)
Hypothesis Generation
# Discover new mechanisms
mechanisms = predictor.discover_mechanisms(
compound_set=active_compounds,
known_targets=["COX-2", "5-LOX"],
novel_predictions=True
)
π Citation
@article{phytoai_models_2025,
title={PhytoAI Discovery Models: AI-Powered Therapeutic Compound Prediction},
author={Tantcheu, Cedric},
journal={arXiv preprint},
year={2025},
url={https://huggingface.co/Gatescrispy/phytoai-discovery-models}
}
π Related Resources
- Dataset: PhytoAI MEGA 1.4M Dataset
- Interactive Tool: PhytoAI Assistant
- Documentation: Complete API Reference
- Tutorials: Jupyter Notebooks Collection
π€ Democratizing AI-powered drug discovery
State-of-the-art models for the global research community
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support