Table of Contents

Protein Location Predictor

A comprehensive GUI application for predicting protein subcellular localization using SVM and Random Forest classifiers using state-of-the-art protein language models including PROST-T5 and ESM-C embeddings as training data.

Features

  • Multiple Model Support: Choose from three different prediction models:

    • PROST-T5: Transformer-based protein language model
    • ESM-C 300M: Evolutionary Scale Modeling (300M parameters)
    • ESM-C 600M: Evolutionary Scale Modeling (600M parameters)
  • User-Friendly GUI: Simple Tkinter-based interface with progress tracking (see screenshot below)

  • Sequential Processing: Process multiple protein sequences from FASTA files

  • Flexible Output: Save predictions with confidence scores in text (CSV) format

  • Error Handling: Comprehensive error handling and user feedback

Supported Python Version

This project has been tested on Python 3.10+.

Requirements

Dependencies (Full environment.yml)

The complete environment definition is located in environment.yml. This file includes all necessary packages for PyTorch, Transformers, ESM models, and GUI operation. Here is a brief excerpt:

name: tesisEnv
channels:
  - bioconda
  - anaconda
  - conda-forge
  - defaults

# Python version and major packages
dependencies:
  - python=3.10.16
  - pytorch=2.6.0
  - torchvision=0.21.0
  - torchtext=0.18.0
  - transformers=4.46.3
  - scikit-learn=1.6.1
  - biopython=1.85
  - esm=3.1.4
  - numpy=1.26.4
  - joblib=1.4.2
  - tk
  # plus many others (see full file for complete list)

To ensure exact reproducibility, use:

conda env create -f environment.yml

Hardware Requirements

  • Minimum: 8β€―GB RAM, CPU-only execution
  • Recommended: 16β€―GB+ RAM, NVIDIA GPU with 8β€―GB+ VRAM
  • Storage: ~5β€―GB for model weights and cache

Installation

  1. Clone the repository (with Gitβ€―LFS for large model files):

    git lfs install
    git clone https://huggingface.co/jpuglia/ProteinLocationPredictor
    

    If you prefer to skip downloading model weights initially:

    GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/jpuglia/ProteinLocationPredictor
    
  2. Navigate into the project directory:

    cd ProteinLocationPredictor
    
  3. Create and activate the Conda environment:

    conda env create -f environment.yml
    conda activate tesisEnv
    
  4. (If skipped above) Download model weights manually: Model files live in the Models/ directory. If you used GIT_LFS_SKIP_SMUDGE, run:

    git lfs pull
    

Usage

GUI Mode

  1. Launch the application:

    python gui.py
    
  2. In the menu, click File β†’ Load FASTA and select your input file (.fasta, .fa, or .fas).

  3. Choose one of the prediction models (PROST-T5, ESM-C 300M, or ESM-C 600M).

  4. Click Run Prediction and monitor the progress bar.

  5. When complete, you will be prompted to choose an output directory and filename.

Example Input & Output

Input FASTA (example/input.fasta):

>protein_1
MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG
>protein_2
MKTIIALSYIFCLVFAHATAKASEQTDNLQWDLAAIDNSGGHNAVDIKQNLQFQCQNNLHGCF

Output CSV (example/output.csv):

Sequence_ID,Prediction 1,Prediction 2,Prediction 3,Prediction 4,Prediction 5,Prediction 6
protein_1,Cytoplasmic (0.9860),CytoplasmicMembrane (0.0081),Periplasmic (0.0029),Extracellular (0.0019),OuterMembrane (0.0007),Cellwall (0.0003)
protein_2,SignalPeptide (0.7523),Extracellular (0.1234),CytoplasmicMembrane (0.0645),Cellwall (0.0345),Periplasmic (0.0201),OuterMembrane (0.0052)

Project Structure

ProteinLocationPredictor/
β”œβ”€β”€ gui.py
β”œβ”€β”€ src/
β”‚   └── my_utils.py
β”œβ”€β”€ Models/
β”‚   β”œβ”€β”€ ProstT5_svm.joblib
β”‚   β”œβ”€β”€ ESMC-300m_svm.joblib
β”‚   β”œβ”€β”€ ESMC-600m_svm.joblib
β”‚   └── ...
β”œβ”€β”€ environment.yml
β”œβ”€β”€ README.md
└── doc/
    └── screenshots/
        └── gui_example.png

Contributing

  1. Fork the repository

  2. Create a feature branch:

    git checkout -b feature/amazing-feature
    
  3. Commit your changes:

    git commit -m "Add amazing feature"
    
  4. Push to your branch:

    git push origin feature/amazing-feature
    
  5. Open a Pull Request or start a discussion: Repository Discussions

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jpuglia/ProteinLocationPredictor

Finetuned
(2)
this model