Exploring the representation of high-entropy alloys for screening electrocatalysis of oxygen reduction reaction via feature engineering

This is repository for high entropy alloys(HEAs) experiments for My Graduation Project "基于特征工程探究高熵合金表达在氧还原电催化剂筛选的应用"

Introduction

A regression model is proposed to predict the *OH adsorption energy of HEAs(high-entropy alloys), which can perfectly handle the problem of "input disorder" and has a excellent performance that the mean absolute error is within 0.038 eV compared with traditional calculations.Moreover, Feature engineering is used to data augment, and shapley value is used for analysing the feature selected by genetic algorithm. It is worth noting that the absorbed atoms’ molar mass and coordination number of atoms constituting the HEAs make great contributions to the prediction of the model. At last, WGAN-GP(Wasserstein GAN using gradient penalty) is used to generate HEAs environments and compositions.

Except for predicting adsorption energy of HEAs, this method can also be used for any other multiatomic systems which are similarly constrained by datasets shortages.

Dependecies

The prominent packages are:

SHAP
numpy
pandas
seaborn
matplotlib
scikit-learn
pytorch 1.8.1

To install all the dependencies quickly and easily, you should use pip install requirements.txt

pip install -r requirements.txt

Dataset

I build up my dataset based on neural-network-design-of-HEA, you can refer this repository for more information.

Because of the ownership of the dataset, this repository doesn't provide HEAs dataset! Therefore, you have to collect your own data!

The data structure is shown below.

	Atom	Ru	Rh	Pd	Ir	PT
A	Period	5	5	5	6	6
	Group	8	9	10	9	10
B	Radius	1.338	1.345	1.375	1.357	1.387
C	CN
D	AtSite
E	pauling Negativity	2.20	2.28	2.20	2.20	2.28
	VEC	8	9	10	9	10
F	M	101.07	102.906	106.42	192.2	195.08
	atomic number	44	45	46	77	78

where CN is coordination number, AtSite is active sites, and M is molar mass.The left features are descriptors we deisred, which are denoted as 'A, B, C, D, E, F' in above table.

You have to follow the coord_numbers coord_nums to fill in the blanks.

If you use the dataset from neural-network-design-of-HEA, you should follow the steps below:

After build up the dataset with 9 features, you should use Pearson correlation coefficient to drop out highly related features to reduce copmutaion cost, run following code:

cd utils
python PearsonSelection.py

PearsonSelection.py use Pearson correlation coefficient to drop out highly related features.

The result will be like:

Get Started

The model can handle any numbers of atoms and is defined by the number of features which means it can also have no limitation in input dimension.

To train the model, you can simply use the following command, and you will get a checkpoint:

# training a model for downstream tasks
python K_fold.py

Obtaining the plot of MAE and RMSE compared with DFT-calculated adsorption energy

# training a model for downstream tasks, you need to update the checkpoint path first! 
python main.py

Pretrained Models

You can also just simply use the checkpoint I have provided in checkpoint/6_500epochs_5_model.pth.

T-SNE

Visualize the data, and the features processed by the model.

python t_SNE.py

Feature engineering

Data augment

python data_augment.py

use $x^2$、$x^3$、$\sqrt{x}$、$log(1+x)$ basic functions to nonlinear feature transformation.And $\frac{1}{x}$ for double feature number. At last, there is 90 features in datasets.

Genetic algorithm

python Feature_selection.py
python SHAP.py

After running the code, you will get a best_result.csv file which will tell you what's the best combination of 90 features.

Shapley value analysis will tell you which feature effects the model prediction of *OH adsorption energy most.

Generate HEAs

You can switch the mode to choose whether to train the regression model. The result of loss plot demonstrates that the training process of GAN is not good :(

python Joint_training.py

Reference

https://github.com/jol-jol/neural-network-design-of-HEA

https://github.com/Zeleni9/pytorch-wgan

* https://arxiv.org/pdf/1612.00593.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
checkpoint		checkpoint
data		data
misc		misc
model		model
utils		utils
.gitignore		.gitignore
Feature_selection.py		Feature_selection.py
Joint_training.py		Joint_training.py
K_fold.py		K_fold.py
LICENSE		LICENSE
README.md		README.md
SHAP.py		SHAP.py
Xgboost.py		Xgboost.py
coord_effect_study.py		coord_effect_study.py
ligand_effect_study.py		ligand_effect_study.py
main.py		main.py
requirements.txt		requirements.txt
t-SNE.py		t-SNE.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring the representation of high-entropy alloys for screening electrocatalysis of oxygen reduction reaction via feature engineering

Introduction

Dependecies

Dataset

Get Started

Pretrained Models

T-SNE

Feature engineering

Data augment

Genetic algorithm

Generate HEAs

Reference

About

Languages

License

Wilmido/HEA4ORR

Folders and files

Latest commit

History

Repository files navigation

Exploring the representation of high-entropy alloys for screening electrocatalysis of oxygen reduction reaction via feature engineering

Introduction

Dependecies

Dataset

Get Started

Pretrained Models

T-SNE

Feature engineering

Data augment

Genetic algorithm

Generate HEAs

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Languages