Open in Constructor Research Platform (a cloud service for scientific computations)
- Environment setup: docs/ENVIRONMENT.md
- Data download: docs/DATA.md
- Reproducing the paper locally: docs/SPARSE-PAPER.md
- Using the pre-trained models
In the paper we propose sparse representation as a way to reduce the computational cost and improve the accuracy of machine learning the properties of defects in 2D materials. The code in the project implements the method, and a rigorous comparison of its performance to the a set of baselines.
Two-dimensional materials offer a promising platform for the next generation of (opto-) electronic devices and other high technology applications. One of the most exciting characteristics of 2D crystals is the ability to tune their properties via controllable introduction of defects. However, the search space for such structures is enormous, and ab-initio computations prohibitively expensive. We propose a machine learning approach for rapid estimation of the properties of 2D material given the lattice structure and defect configuration. The method suggests a way to represent configuration of 2D materials with defects that allows a neural network to train quickly and accurately. We compare our methodology with the state-of-the-art approaches and demonstrate at least 3.7 times energy prediction error drop. Also, our approach is an order of magnitude more resource-efficient than its contenders both for the training and inference part.
The main idea of our method is using a point cloud of defects as an input to the predictive model, as opposed to the usual point cloud of atoms, or expertly created feature vector.
We compare our approach to state-of-the-art generic structure-property prediction algorithms: GemNet, SchNet, MegNet, matminer+CatBoost.
For dataset, we use 2DMD. It consists of the most popular 2D materials: MoS2, WSe2, h-BN, GaSe, InSe, and black phosphorous (BP) with point defect density in the range of 2.5% to 12.5%. We use DFT to relax the structures and compute the defect formation energy and HOMO-LUMO gap. ML algorithms predict those quantities, taking unrelaxed structures as input.
Use the library https://github.com/HSE-LAMBDA/MEGNetSparse/
- Clone the repository
- Set up the environment
- Download the weights and data:
dvc pull datasets/checkpoints/combined_mixed_all_train/formation_energy_per_site/megnet_pytorch/sparse/05-12-2022_19-50-53/d6b7ce45/0.pth.dvc datasets/checkpoints/combined_mixed_all_train/homo_lumo_gap_min/megnet_pytorch/sparse/05-12-2022_19-50-53/831cc496/0.pth.dvc csv-cif-low-density-8x8 csv-cif-no-spin-500-data csv-cif-spin-500-data train-only-split
The data are not needed for predictions, and are only used to generate new structures in the example notebook.
- Open the notebook. It contains the prediction code, along with generation of new structures with defects, and example processing of user-uploaded data.
Please cite the following two papers if you use the code or the data:
Kazeev, N., Al-Maeeni, A.R., Romanov, I. et al. Sparse representation for machine learning the properties of defects in 2D materials. npj Comput Mater 9, 113 (2023). https://doi.org/10.1038/s41524-023-01062-z
Huang, P., Lukin, R., Faleev, M. et al. Unveiling the complex structure-property correlation of defects in 2D materials based on high throughput datasets. npj 2D Mater Appl 7, 6 (2023). https://doi.org/10.1038/s41699-023-00369-1