Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basis set management? #45

Open
zooks97 opened this issue Feb 10, 2021 · 15 comments
Open

Basis set management? #45

zooks97 opened this issue Feb 10, 2021 · 15 comments

Comments

@zooks97
Copy link
Contributor

zooks97 commented Feb 10, 2021

As I'm starting to play with local-basis DFT codes (e.g. OpenMX, Siesta, ORCA, Gaussian, etc.), it's become clear that along with pseudopotentials, one has to manage basis sets in a very similar way.

The framework for this would be basically identical to what we do here in aiida-pseudo, and as such, do you think it would be better to extend aiida-pseudo to also support managing basis sets or rather make a parallel aiida-basis plugin explicitly for that purpose?

I need to think it over a bit more, but there could be significant shared code between the efforts, and it may be easier for both efforts to benefit from a shared foundation.

@bosonie
Copy link
Collaborator

bosonie commented Feb 10, 2021

As a first comment I would say that the task is not as easy as it seems. Optimal basis sets most likely depend on the chemical environment.
In SIESTA there are almost 20 years of development and nobody even dare to create a database of basis.
I guess that with the advent of high-throughput we can definitely gather some systematic knowledge, but so far I see much more useful to have a basis optimizer that run on the system of interest before doing the calculation.
In any case I would start this massive project separately from aiida-pseudo.

@zooks97
Copy link
Contributor Author

zooks97 commented Feb 10, 2021

I'd be interested to understand the problem better; maybe we can get in touch and discuss sometime.

Like you mention, I think that the most important application would be high-throughput, but I think it could also make using basis sets from, for example, the BSE easier and more provenance-friendly.

As I've thought about it a bit more, I think I agree that starting a separate plugin, while maybe taking some cues from aiida-pseudo, would be a better path forward.

@bosonie
Copy link
Collaborator

bosonie commented Feb 10, 2021

Yes sure, we can have a chat in a week or two. Let me know!

@sphuber
Copy link
Contributor

sphuber commented Feb 11, 2021

You could definitely start with developing this in aiida-basis, even depending directly on aiida-pseudo to reuse bits without copying and then when thinks have settled, seem to work well and there is still a lot of overlap, we can merge it

@dev-zero
Copy link

@zooks97 maybe also take a look at my aiida-gaussian-datatypes plugin?

@zooks97
Copy link
Contributor Author

zooks97 commented May 17, 2021

@dev-zero Thanks for mentioning it! I saw it around the time I created this issue, but I'll give it another look. Maybe it could be possible to do something similar for numeric orbitals?

@dev-zero
Copy link

@zooks97 sure. From my point of view there are the following points when designing a data type plugin for basissets:

  • do you need or want to work with the data to some extent, meaning do you want to be able to easily implement queries like: filter by exponent or operations like: get only a subset (shell). In this case one may want to consider finding a json-based storage in the database otherwise a singlefiledata should be enough ... for the gaussian-datatypes I went for storing in the DB since they are usually small and you might want to mix-and-match/filter.
  • do you have to be able to reproduce them bit-by-bit: by storing them in the DB as json structure you might then want to either preserve the original file content or store coefficients/values as strings (hampers the previous case but might still be better than reparsing the file all the time) ... for the CP2K ones I went with storing them as JSON floats and reformatting them on output to a fixed number of digits (might have to be revised)
  • what kind of metadata do you need: is it optimized for a specific level of theory, pseudopotential, number of valence electrons in a pseudo, RI, etc. ... for the gaussian-datatypes I went with a list of tags plus the number of valence electrons
  • do you need a mechanism to deprecate a basisset (basically versions with the same name) ... for the gaussian-datatypes I went with a version attribute allowing exactly that

@addman2
Copy link

addman2 commented Sep 2, 2021

Dear all,

I am also interested in this topic. Is there anything new since the last comment was made?

@dev-zero
Copy link

dev-zero commented Sep 3, 2021

@addman2 what exactly are you interested in? Which types of basis sets?

@addman2
Copy link

addman2 commented Sep 3, 2021

Dear @dev-zero,

details you can find in this mailing list:

https://groups.google.com/g/aiidausers/c/kdoLb-NO4LI

I will summarize. I am writing an aiida-package for our code QMC code. Mostly we are using PPs from these two databases:

https://pseudopotentiallibrary.org/
http://burkatzki.com/pseudos/index.2.html

I was thinking to put them as installable "families" inside the aiida-pseudo package. Similarly, it could retrieve the recommended basis for the PP. Mainly I am interested in GTO bases, but I don't want to be restricted to them. I was looking at your aiida-gaussian-datatypes package and it has 80% of the functionalities I was looking for. I really like the way hot Basis and Pseudo Data types were made.

The things which are missing is basically, the automatic fetcher from the internet. I can contribute on this.

@azadoks
Copy link

azadoks commented Sep 13, 2021

I've been working on this sporadically here.
I have (mostly) working support for OpenMX PAO bases which are managed as loose files just as done by aiida-pseudo.

For GTO bases, I was working to integrate with the Basis Set Exchange python module.
I generally only have experience with plane wave codes and with OpenMX, so I don't know the best way to handle, e.g., GTO bases in AiiDA (i.e. as files, as done here, or as an AiiDA data type that contains the relevant data + some code for writing that data in different formats).

I'd really appreciate any feedback, maybe over in the aiida-basis repository, @dev-zero and @addman2.

p.s. as you mentioned recommended PPs corresponding to basis sets, this is another open question of mine and why I made this issue here in aiida-pseudo first. OpenMX provides basis-pseudo pairs, and it would make sense to me to provide both with the same AiiDA plugin (although there is a many-to-one correspondence between bases and pseudopotentials respectively).

@addman2
Copy link

addman2 commented Sep 21, 2021

Hi azadoks,

Sorry for my late response, been busy lately. I started playing with aiida-gaussian-datatypes in order to find out if it ispossible to use it for GTO basis and ECP I'm using. I started with PPs, it turned out PPs from the original lib were not compatible with mine, so I created abstract class Pseudopotential(Data) and two derivates, the original one and one that fits my format. I think this work out well, you can check it here.

The next step I would like to do is creation of localized basis format. I was looking at your BasisData format in aiida-basis. One thing which concerns me is the BasisData is Singlefile. I have plans to make adjustments to the Basis set before I use it. I believe a Dict (or Data) type would be more suitable.

@dev-zero
Copy link

@azadoks one of the things on my todo list for the aiida-gaussian-datatypes is also the import from the Basis Set Exchange. At the moment I would most likely implement it as a workflow (given an identifier the workflow fetches it and gets you a basis set object), for the sake of provenance.

@addman2 I think I could pull your changes directly into the plugin: CP2K can also support other types of pseudos (ECP), hence adding the type to the main plugin is definitely something we can and want to do.
Wrt the basis sets: this is one of the reasons I decided to store the basis sets in the aiida-gaussian-datatypes plugin as a nested dict (in the database as a JSON).

@sphuber
Copy link
Contributor

sphuber commented Sep 21, 2021

given an identifier the workflow fetches it and gets you a basis set object

Does "fetching" here mean obtain it from a URL? Because in this case, it might suffice to simply store the source in the Data node attributes. That is what it is designed for. Would be a bit overkill to go through a workflow.

@dev-zero
Copy link

@sphuber it may also consist of converting to the storage format

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants