Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PwParser: Refactor to primarily parse from XML and only parse from stdout what is necessary #940

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented May 23, 2023

Fixes #353

The earliest versions of the `PwParser` exclusively parsed from the
stdout because that was all tha `pw.x` produced. At some point, it
started also writing output in an XML file, however, this format was
deprecated as of QE v6.2 and replaced by an XML with a different schema.
The advantage of the latter is that it comes with an XSD file which can
be used to validate and parse XML output files.

For backwards compatibility reasons, the `PwParser` has never fully
transitioned to parsing exclusively from the XML. This is also because
the XML does not provide all data that was currently parsed from the
stdout.

The time has come to start getting rid of the stdout parsing as it is
intrinsically fragile. The current version of `aiida-quantumespresso`
anyway only supports QE v6.6 and up, all of which have full support for
the schema based XML output. The idea is to migrate the `PwParser` to
parse almost exclusively from the XML file and only fetch that from the
stdout which the XML does not provide.

To prepare this move, the test fixtures, which often contain reference
files generated by old QE versions with the old deprecated XML format,
need to be updated to provide reference output with modern QE versions.
Here we replace the output files with output generated by QE pw.x v6.6.

For the "success" calculations, a new calculation was run and the output
files were used. The same technique was applied for certain "failed"
cases wherever that was possible by setting a specific input. For
example, the out-of-walltime error could easily be simulated by setting
the `max_seconds` input to 1 second, guaranteeing the error to be hit.
There are however a number of errors that are very specific and
difficult to reproduce, notably the BFGS errors for the vc-relax tests.
For these, the XML has been replaced with outputs of a similar
calculation that actually succeeded. These files are then naturally not
commensurate with the stdout file, but this is not important since they
are testing the parsing of the error, which is only done from the
stdout anyway.

Following are notes related to changes for specific tests cases:

* `vcrelax_success_rVV10`: Has been renamed to `scf_success_rVV10`. The
reason being that the calculation had to be reperformed and a `vc-relax`
kept failing with a `ERROR_IONIC_CONVERGENCE_REACHED_FINAL_SCF_FAILED`
Since the relaxation is not the point of the test, an `scf` calculation
was done instead.
@mbercx
Copy link
Member

mbercx commented Dec 20, 2023

@sphuber: @t-reents and I have been working on getting the tests to pass in:

https://github.com/mbercx/aiida-quantumespresso/tree/fix/353/pw-parser-xml-only

Note that in mbercx@45b9183 we decided that for the interrupted runs, we accept that we no longer parse as much information from the stdout as before. If we want to keep stdout parsing as a backup just in case the XML is not written properly for everything, then we have to keep maintaining the stdout parsing and I think this is exactly what we want to avoid here.

Now all tests are passing:

https://github.com/mbercx/aiida-quantumespresso/actions/runs/7279821777

Do you mind if I rebase this PR, add my changes here? I think we still need to do some final polishing of parts of the code, and a good ol' review, but we're close to finalising this!

@sphuber
Copy link
Contributor Author

sphuber commented Dec 20, 2023

Thanks for the help guys. Regarding the stdout parsing: I am afraid we unfortunately anyway have to keep it because all the error parsing comes from there. The XML reports nothing in terms of warnings or errors I believe, so all our error codes rely on the stdout. Or am I missing something?

@t-reents
Copy link

No, that's correct. Moreover, something we were also discussing, a bunch of parameters that are present in the trajectory are only available in the stdout, e.g. magnetization in each step of a relaxation. The same is true for charges etc.

On the other hand, the final parameters are available in the XML as well.

@sphuber
Copy link
Contributor Author

sphuber commented Dec 20, 2023

Ideally, we would collect all data that we parse from the stdout that is not part of the XML and make a comprehensive list. We can then present this to the QE developers and ask whether (part of) that data can be made available in the XML. Especially parsing the various errors from XML would greatly robustify things I think. But this would be a lot of work and would only really bear fruits in quite a long time :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Parsing only the XML output (of PW)
3 participants