Let me briefly introduce the field I am responsible for - artificial intelligence-assisted drug discovery and design. In order to better help everyone understand this complex cutting-edge technology, I will explain it in Chinese, especially to support those who may have language barriers. Although this content is usually more suitable for presentation in English, I have chosen to provide a more intuitive learning path in Chinese to ensure everyone can better grasp the core concepts and quickly get started.
Artificial intelligence-assisted drug discovery and design is a method that utilizes artificial intelligence (AI) technology to accelerate and optimize the drug development process. Traditional drug research and development cycles are long, costly, and have low success rates, while AI can significantly enhance the efficiency and accuracy of drug design through big data processing, model prediction, and automated analysis.
- Accelerate the R&D cycle: AI can quickly screen a large number of compounds, reducing the workload of experimental screening.
- Enhance accuracy: AI can improve the predictive accuracy in drug design through large-scale data training of models.
- Cost reduction: AI can effectively reduce the cost of drug development due to the reduction in the repetitiveness and failure rate of experiments.
- Personalized Drug Design: AI can assist in designing personalized drugs, targeting specific patient or disease characteristics to optimize therapeutic effects.
The application of artificial intelligence in drug discovery and design makes the drug development process more intelligent and automated. Through key steps such as compound screening, target prediction, drug generation, and ADMET analysis, AI can effectively accelerate the process of drug research and development while improving the accuracy and efficiency of design.
- First, I would like to express my gratitude to Professor Lei at Sun Yat-sen University Medical School for introducing me to this field. I entered the campus with no knowledge, so I didn't know my direction, and was quite confused, so I am especially grateful to my mentor. Moreover, I would like to thank Professor Yu at Huaiyin Institute of Technology for his help in writing this. I would also like to thank Zhang Shengyu for his strong support in completing my studies.
- Secondly, first of all, thank you to the big shot in computational chemistry at Nanjing University of Technology, "Guo Weitao" student, who has been helping me and explaining the process of pharmaceutical manufacturing.
- Third, I still have to thank my good friend——the student Zhou Yiping from Nanjing Medical University, for she is a clever, beautiful, and kind girl who has greatly helped me.
- Secondly, thank you to the big shots in my group for all their guidance.
- Last but not least, I would like to express my gratitude to the members of the public and my peers for helping me criticize and correct, making this document and this code more complete and perfect, and making our motherland stronger and better. Thank you, motherland, thank you, Party.
- Thank you to those who helped me complete my studies!
Abstract: Drug discovery and development affects various aspects of human health and dramatically impacts the pharmaceutical market. However, investments in a new drug often go unrewarded due to the long and complex process of drug research and development (R&D). With the advancement of experimental technology and computer hardware, artificial intelligence (AI) has recently emerged as a leading tool in analyzing abundant and high-dimensional data. Explosive growth in the size of biomedical data provides advantages in applying AI in all stages of drug R&D. Driven by big data in biomedicine, AI has led to a revolution in drug R&D, due to its ability to discover new drugs more efficiently and at lower cost. This review begins with a brief overview of common AI models in the field of drug discovery; then, it summarizes and discusses in depth their specific applications in various stages of drug R&D, such as target discovery, drug discovery and design, preclinical research, automated drug synthesis, and influences in the pharmaceutical market. Finally, the major limitations of AI in drug R&D are fully discussed and possible solutions are proposed.
Keywords: Artificial intelligence; Machine learning; Deep learning; Target identification; Target discovery; Drug design; Drug discovery
- 👉🏻👉🏻👉🏻👉🏻👉🏻 Currently, the table of contents is divided into, the first, second, and third sections respectively: providing for students without a foundation, and students with a foundation
- Application of Artificial Intelligence in Medicine
- Python Basics
- Numpy, Pandas
- Matplotlib
- Machine Learning and Scikit-Learn
- Deep Learning
- CADD
- Graph Neural Network
- Python and Basics of Machine Learning
- Molecular text representation: SMILES
- Molecular Vector Representation: Descriptors and Fingerprints
- RDKit Introduction
- Classical machine learning models: linear regression, random forest, support vector machine
- Exploration of Publicly Available Small Molecule Datasets
- Biologically active molecules ChEMBL database
- ZINC Database
- PubChem Database
- Exploratory Data Analysis (EDA)
- Quantitative Structure-Activity Relationship (QSAR) and Virtual Screening (VS)
- Graph Neural Network
- Neural Network Architecture and Training
- Molecular diagram, atomic characterization
- Message Passing Neural Network
- Tiled Convolutional Neural Network
- Explainability: Grad-CAM
- Molecular Docking
- Molecular data formats: SMI, SDF, MOL2, PDB
- Field
- Protein folding
- Using AutoDock Vina, Smina, QuickVina for molecular docking
- Interchangeable fingerprint
- Pharmacodynamic group
- Deep generative model
- Automatic Encoder
- Recurrent Neural Network
- SMILES Generator: ReLeaSE and REINVENT
- Graph-based Generative Model: JT-VAE
- Molecular Characteristic Optimization: Reinforcement Learning and Bayesian Optimization
- Protein Deep Learning
- Simplified protein representation
- Voxel grid representation
- For encoding the grid representation of protein surfaces
- 3D Convolutional Neural Network
- Uncertainty Prediction
- Any and cognitive uncertainty
- Conformal Prediction
- Basic Knowledge
- Chemical Informatics RDKit Introduction
- Application of Pandas in Chemical Informatics
- SMILES Tutorial
- SMARTS Tutorial
- Basic Reactions Enumeration
- List of stereoisomers and diastereomers
- Utilizing Datamol and Molfeat to streamline chemical informatics workflows
- Data processing, descriptors, and clustering
- Cluster
- K-Means Clustering
- Taylor-Butina clustering
- Complex chemical informatics analysis
- Chembl System Analysis
- Based on Chembl database drug data analysis
- Analysis based on patent data from BindingDB
- SAR Analysis
- Scaffolding Identification
- R-group analysis
- Position simulation scan analysis
- Free-Wilson analysis
- Matching molecular pairs
- Matching molecular set
- Machine Learning Model
- Construct and test a QSAR model
- Model Classification Construction and Comparison
- Building and Comparing Regression Models
- Active Learning
- Active Classification
- Active Regression
- Active Shape Search
- Neural Network Potential
- Predicting isomers' energy using Auto3D
- 01_Compound data collection from ChEMBL
- 02_ From PubChem Data Acquisition
- 03_ Data from KLIFS
- Complete Project: "Predicting Bioactivity Based on Machine Learning"
- 04_Query Online API Network Service
- 05_Molecular Filtration: ADMET and Lead Compound Similarity Criteria
- Complete Project: "Molecular ADMET Prediction Based on Machine Learning and Deep Learning"
- Complete Project: "Molecular Toxicity Prediction Based on GNN"
- 06_Molecular Filtration: Unwanted Substructures
- 06_1 Complete Project: "Ligand Screening Based on ADMET and RO5 for Molecular Screening and Compound Similarity"
- 07_Molecular Representation
- 08_Ligand-based screening: Compound similarity
- 09_Composite Clustering
- 10_Largest Common Substructure
- 11_Ligand-Based Pharmacophore
- 12_Combining site similarity and off-target prediction
- 13_Protein Data Acquisition: Protein Data Bank (PDB)
- 14_Combination Ligation Site Detection
- 15_Protein-ligand docking
- Predicting the retrosynthetic accessibility of bioactive molecules
- 16_Protein-ligand interactions
- 17_NGLview Advanced Usage
- 18_Molecular Dynamics Simulation
- 19_Analysis of Molecular Dynamics Simulation
- 20_Lead Compound Optimization Automation Process
- 21_Ligand-based Screening: Machine Learning
- 22_Ligand-based screening: Neural network
- 23_Molecular Property Prediction Based on RNN
- 24_Molecular Property Prediction Based on GNN
- 25_Molecular Property Predictor Converter
- 26_ Uncertainty Estimation
- 27_1 RNA Aptamer Data Source
- Data source: RNAapt3D (https://rnaapt3d.medals.jp/)
- 27_2 Data Cleaning and Preprocessing
-
28_1 Primary Structure Prediction
-
28_2 Structural Visualization and Analysis
-
29_1 Secondary Structure Prediction
-
29_2 Energy minimization and folding stability analysis
- ΔG (Free Energy) and Folding Stability Diagram
- 30_1 RNA Aptamer Binding Site Prediction to Target
- 30_2 Calculation and Optimization of Binding Energy
- 30_3 Drug Design Based on RNA Aptamers
- 30_4 Drug Chemistry and Virtual Screening
- 30_5 Molecular Dynamics Simulation
- 30_6 Experimental Verification
- 31_Kinase similarity: Sequence
- 32_Kinase Similarity: Kinase Pocket (KiSSim Fingerprint)
- 33_Kinase Similarity: Interaction Fingerprint
- 34_Kinase Similarity: Ligand Profile
- 35_Kinase Similarity: Comparison of Different Views
- 36_ Designed kinase inhibitors based on kinase fragment libraries
- 37_Protein-ligand interaction prediction
- Complete Project: "Project Practice: Prediction of Chemical Reaction Yields Using Transformer-Based Deep Learning"
- Complete Project: "Project Practice: Mapping the Space of Chemical Reactions Using Attention-Based Neural Networks"
- Complete Project: "Project Practice: A Graph to Graphs Framework for Retrosynthesis Prediction of Small Molecule Compounds"
- Complete Project: "Project Practice: Generative Language Modeling for Antibody Design"
- 38_Running 3D Dynamics Based on KLIFS Data
- Consensus-based Integrated Structure Virtual Screening (Protein Preparation, Docking, Binding Site Selection, Rescore and Sort)
- 40_One-Hot Encoding
- 41_Using Code to Draw Molecular Graphs
- Research objective: Explore the relationship between genes and immune infiltration
- Background Introduction: The Importance of Immune Infiltration and Its Impact on Gene Expression
- Software tools required for installation and configuration (such as R, Python, Bioconductor)
- Download gene expression data from GEO, TCGA, and other databases
- Data standardization and cleaning
- Annotate gene expression data using annotation files (such as GTF or GFF files)
- Tools:
biomaRt
org.Hs.eg.db
- Apply SVA to correct batch effects
- Code Implementation: Method
ComBat
orsvaseq
in thesva
package
- Using tools such as
limma
andDESeq2
for differential gene analysis - List of differentially expressed genes (DEGs)
- Plot volcano plots using the results of the differential analysis
- Code Implementation:
ggplot2
,EnhancedVolcano
Package
-
08. Metascape
- Conduct pathway and functional enrichment analysis on the Metascape website
-
09. Gene Ontology (GO)
- Using GO for gene function annotation and enrichment analysis
- Tools:
clusterProfiler
topGO
-
10. KEGG Enrichment Analysis
- Using KEGG database for signaling pathway enrichment analysis
- Tools:
clusterProfiler
KEGGREST
- 11. Protein-Protein Interaction (PPI)
- Constructing protein-protein interaction networks
- Tools:
STRING
Database,Cytoscape
- Constructing a random forest model, filtering important genes
- Draw the expression heat map of different genes
- According to differential gene calculation of gene score
- Code Implementation: Comprehensive Scoring Method Based on Gene Expression Values
- Predicting gene expression using neural network models
- Tool:
neuralnet
Package
- Plot ROC curve, evaluate model accuracy
- Tool:
pROC
Package
- Evaluate the accuracy and feasibility of the gene scoring model
- Evaluate the predictive ability of neural network models
- Validate ROC Curve Using Test Dataset
- Using CIBERSORT analysis to estimate the proportion of immune cells from gene expression data
- Create a histogram of immune cell proportions
- Tool:
ggplot2
- Create a diagram showing the relationship between gene expression and immune cell infiltration
|-- Al-drug-design-reference.Data <- 参考文献文件夹
|-- README.md <- 详细简介
|-- img <- md的图片
|-- docs <- 文档
|-- Al-drug-design-reference.enl <- 参考文献文件
|-- list <- 项目结构目录
|-- Al-drug-design.yml <- 环境配置
| |-- 00_ai in_medicine <- Python基础知识(❤️如果你有Python基础,或者你有Python与药物设计基础,你可以跳过这一章节,直接从01开始看)
| |-- 03_03_Al Drug Discovery & Design <- 化合物采集
Install Anaconda and Git. Please refer to the Anaconda website and the Git website for download.
❤️ u must need read paper
Where is AIDrugDesign.yml ?
u first git clone my link!!!, it is have AIDrugDesign.yml.
conda env create -f AIDrugDesign.yml
If you are abroad or have🪜, please open AIDrugDesign.yml
Delete:
- pytorch
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
- https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free
conda env list
Check to see if the AI-drug-design
conda activate AIDrugDesign
If you want to learn more about Git, please go to [if_operation_git_process.md]
你的翻译请求没有提供具体的源文本,因此我无法进行翻译。请提供需要翻译的文本,我将按照您的要求进行翻译。
- Go to the official Git download page.
- Select the Windows version for download and follow the prompts to complete the installation.
- At installation, the default options can be used. If customization is desired, different configurations can be selected as needed, such as the editor, environment variables, etc.
After installation, open the terminal (or Git Bash) and configure your Git user information:
git config --global user.name "Your Name"
git config --global user.email "youremail@example.com"
Thus, Git will use this information to identify your identity each time you commit code.
On Windows, open Git Bash, enter the following command to generate an SSH key:
ssh-keygen -t rsa -b 4096 -C "youremail@example.com"
- After pressing Enter, you will see a prompt to select the location for the storage key, default Enter is fine.
- Then you need to set a password, which can be empty, but it is recommended to do so.
-
Use the following command to display the public key you generated:
cat ~/.ssh/id_rsa.pub
-
Copy the public key and log in to GitHub.
-
Go to the SSH and GPG key page on GitHub, click New SSH key.
-
Paste the public key you just copied into the text box, save after adding.
After completing the SSH key configuration, you can use the following command to clone the code repository:
git clone git@github.com:itWangCode/AI-drug-design.git
After executing the command, Git will pull the code into the local folder. If successful, you will see information similar to the following:
Cloning into 'AI-drug-design'...
At this point, the code has been successfully pulled.
macOS usually comes with Git pre-installed; if not, Git can be installed via Homebrew:
-
Open the terminal and enter the following command to install Homebrew (if not already installed):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
-
After installation, enter the following command to install Git:
brew install git
-
Open the terminal and enter the following command to configure Git:
git config --global user.name "Your Name" git config --global user.email "youremail@example.com"
-
Enter the following command in the terminal to generate an SSH key:
ssh-keygen -t rsa -b 4096 -C "youremail@example.com"
-
As with Windows, press Enter to store the key at the default path, set a password (optional).
-
At the terminal, use the following command to view the public key:
cat ~/.ssh/id_rsa.pub
-
Copy the output public key, log in to GitHub, and add the public key to the GitHub SSH Keys page.
Use the following command to clone your repository to your local machine:
git clone git@github.com:itWangCode/AI-drug-design.git
Git will download the code to the current directory, indicating a successful code pull.
Thus, you have successfully configured Git and cloned the code repository.
How to download and install Anaconda-Navigator on Windows and macOS systems and configure your environment
- Go to the Anaconda official website.
- Click the download button, select the version suitable for the Windows system (usually 64-bit).
- Download completed, run the installation program and follow the prompts to complete the installation.
- Similarly, go to the Anaconda official website.
- Select the version of macOS system to download and install.
During the installation process, it is recommended to check the option "Add Anaconda to my PATH environment variable" so that the conda
command can be used directly in the terminal.
After installation:
- On Windows, Anaconda-Navigator can be found in the Start menu and run.
- On macOS, you can find Anaconda-Navigator in the "Applications" folder, and click to launch it.
Before configuring in the Anaconda environment, you first need to clone your GitHub repository to your local machine.
-
Open terminal (macOS) or Git Bash (Windows).
-
Run the following command to clone your repository:
git clone git@github.com:itWangCode/AI-drug-design.git
This will download the repository containing the AIDrugDesign.yml
file to the local machine.
Navigate to the clone project folder and create a new Conda environment through the .yml
file:
-
In the terminal or Git Bash, switch to the directory of the cloned repository:
cd AI-drug-design
-
Run the following command to create a Conda environment based on the
AIDrugDesign.yml
file:conda env create -f AIDrugDesign.yml
❤️ Please wait for more than 15 minutes, please connect to Wi-Fi !!!!!!!
- Conda will automatically install the required dependency packages based on the
.yml
file and create the environment.
After installation, use the following command to view all environments and check if the AI-drug-design
environment has been created:
conda env list
You should see an output similar to the following, which includes AI-drug-design
:
# conda environments:
#
base * /path/to/anaconda3
AI-drug-design /path/to/anaconda3/envs/AI-drug-design
Finally, use the following command to activate the AI-drug-design
environment:
conda activate AI-drug-design
You have now successfully configured and activated the AI-drug-design
environment, and you can start developing using this environment.
I sometimes get particularly scared of Mac OS system classmates
More operations: Please refer to: [https://www.exception.site/article/1762\]