Skip to content

EGFR-AP : Epidermal Growth Factor Receptor-Activity Predictor

Notifications You must be signed in to change notification settings

amarinderthind/EGFR-ap

Repository files navigation

EGFR-AP : Epidermal Growth Factor Receptor-Activity Predictor

       

How the App works

To start the this python app, users should download all the files in this repository and keep them in a single folder on their local machine (Check below, how to download).

Download

You can download the repository using either SSH or HTTPS. Choose one of the following commands based on your preference.

# git clone <SSH> or <HTTPS>
git clone git@github.com:amarinderthind/EGFR-ap.git
git clone https://github.com/amarinderthind/EGFR-ap.git

Check the Model File; If the model file is not downloaded properly, you may need to download it separately.

Download the model file if necessary:

You can use the following command to download it. Make sure to keep it in the same folder until you change the configuration.

wget -O ML_model_EGFR.pkl https://github.com/amarinderthind/EGFR-ap/raw/refs/heads/main/ML_model_EGFR.pkl?download=

After download Navigate to the Repository Directory

cd EGFR-ap

The user can then run the script 'EGFR_app.py'.

To run this script, use following command

streamlit run EGFR_app.py

If streamlit is not installed, to install streamlit

pip install streamlit

This will open the app on the browser on the user's local machine. The user can then upload the input file in the format specified below and click the 'Predict' button. This will run the tool, and the user will get the desired output.

Steps of the pipeline

The required input is a text file containing the SMILES notation of the small molecules whose activity will be predicted against EGFR. The user can provide a text file as a batch containing more than one small molecule's SMILES notation.

These molecules are then subjected to the computation of the 2D descriptors using the PadelPy module. The user can visualize the molecular descriptors computed for the input molecules and download them in a .csv format. Next, the app filters the initially calculated molecular descriptors and keeps only the critical descriptors, which were found to be directing the inhibitory activity of EGFR inhibitors during model building. The app then provides these essential descriptors of a different table that the user can visualize and download in .csv format. Finally, this information is then utilized by the app to predict the pIC50 for the input molecules using an extra trees regressor algorithm. The user can visualize and download the activities of the input molecules. We provide the option to calculate a variety of 2D molecular descriptors for the input molecules supplied by the user, along with their predicted activities (pIC50) against EGFR. The user can visualize or download all the output files per their requirements.

Input file requirements

The small molecules for which the user needs to predcit the EGFR activity should be uploaded to the app as a text file. This text file should coantin the smiles notation of the molecules. The text file can have many molecules but they should be in the SMILES format. Check the example file (Input_file_example.txt) for details.