The model eos30gr
that is deepherg has been chosen for model validation.
The repository is organised in folders:
/notebooks
contains the jupyter notebooks where most of the work is being developed/data
contains all the .csv files. Model predictions are obtained outside this repository and saved in this folder./src
contains important functions I will re-use throughout the repository, to avoid typing them each time./plots
contains the plots I have produced during the model validation processrequirements.txt
lists all the required packages to run the notebooks in this repository. If possible I also specify the version of the package I am using.
- First refer to the Ersilia documentation to set up a virtual environment and set up the Ersilia model hub.
- Enter the virtual environment and run the following command:
pip install -r requirements.txt
- Install
ipykernel
for this environment and set up the kernel for the Jupyter notebook for this virtual environment.
conda install ipykernel
python -m ipykernel install --user --name ersilia --display-name "ersilia"
- Run the
.ipynb
notebooks in\notebooks
directory.
- We notice that our model is biased towards non-hERG blockers, indicating that most of the compounds are not hERG blockers. This can be dependent on the number of hERG blockers present in reality and hence, our random dataset can be assumed to be a reflection of it.
- We can conclude that there are two possible reasons why we have the above result. There is a possibility that our dataset ended up with majorly non-hERG blockers purely by chance. Other possibility is that most of the compounds in nature do not have the property of being a non-hERG blocker, hence explaining our result.
- According to our dataset, every single molecule should be a hERG blocker.
- However, we are not able to reproduce our dataset. There is a possibility that the model has not been trained correctly, explaining the result.
- Read Outreachy's contribution tasks
- Read Ersilia's documentation
- Get inspiration from Ersilia's work, for example on this repository for data processing
- Use Slack to ask the mentors and the other interns for help!
All the code in this repository is licensed under a GPLv3 License.