`PwParser`: Refactor to primarily parse from XML and only parse from stdout what is necessary #940

sphuber · 2023-05-23T08:14:07Z

Fixes #353

The earliest versions of the `PwParser` exclusively parsed from the stdout because that was all tha `pw.x` produced. At some point, it started also writing output in an XML file, however, this format was deprecated as of QE v6.2 and replaced by an XML with a different schema. The advantage of the latter is that it comes with an XSD file which can be used to validate and parse XML output files. For backwards compatibility reasons, the `PwParser` has never fully transitioned to parsing exclusively from the XML. This is also because the XML does not provide all data that was currently parsed from the stdout. The time has come to start getting rid of the stdout parsing as it is intrinsically fragile. The current version of `aiida-quantumespresso` anyway only supports QE v6.6 and up, all of which have full support for the schema based XML output. The idea is to migrate the `PwParser` to parse almost exclusively from the XML file and only fetch that from the stdout which the XML does not provide. To prepare this move, the test fixtures, which often contain reference files generated by old QE versions with the old deprecated XML format, need to be updated to provide reference output with modern QE versions. Here we replace the output files with output generated by QE pw.x v6.6. For the "success" calculations, a new calculation was run and the output files were used. The same technique was applied for certain "failed" cases wherever that was possible by setting a specific input. For example, the out-of-walltime error could easily be simulated by setting the `max_seconds` input to 1 second, guaranteeing the error to be hit. There are however a number of errors that are very specific and difficult to reproduce, notably the BFGS errors for the vc-relax tests. For these, the XML has been replaced with outputs of a similar calculation that actually succeeded. These files are then naturally not commensurate with the stdout file, but this is not important since they are testing the parsing of the error, which is only done from the stdout anyway. Following are notes related to changes for specific tests cases: * `vcrelax_success_rVV10`: Has been renamed to `scf_success_rVV10`. The reason being that the calculation had to be reperformed and a `vc-relax` kept failing with a `ERROR_IONIC_CONVERGENCE_REACHED_FINAL_SCF_FAILED` Since the relaxation is not the point of the test, an `scf` calculation was done instead.

mbercx · 2023-12-20T19:23:51Z

@sphuber: @t-reents and I have been working on getting the tests to pass in:

https://github.com/mbercx/aiida-quantumespresso/tree/fix/353/pw-parser-xml-only

Note that in mbercx@45b9183 we decided that for the interrupted runs, we accept that we no longer parse as much information from the stdout as before. If we want to keep stdout parsing as a backup just in case the XML is not written properly for everything, then we have to keep maintaining the stdout parsing and I think this is exactly what we want to avoid here.

Now all tests are passing:

https://github.com/mbercx/aiida-quantumespresso/actions/runs/7279821777

Do you mind if I rebase this PR, add my changes here? I think we still need to do some final polishing of parts of the code, and a good ol' review, but we're close to finalising this!

sphuber · 2023-12-20T20:33:30Z

Thanks for the help guys. Regarding the stdout parsing: I am afraid we unfortunately anyway have to keep it because all the error parsing comes from there. The XML reports nothing in terms of warnings or errors I believe, so all our error codes rely on the stdout. Or am I missing something?

t-reents · 2023-12-20T20:40:52Z

No, that's correct. Moreover, something we were also discussing, a bunch of parameters that are present in the trajectory are only available in the stdout, e.g. magnetization in each step of a relaxation. The same is true for charges etc.

On the other hand, the final parameters are available in the XML as well.

sphuber · 2023-12-20T20:43:47Z

Ideally, we would collect all data that we parse from the stdout that is not part of the XML and make a comprehensive list. We can then present this to the QE developers and ask whether (part of) that data can be made available in the XML. Especially parsing the various errors from XML would greatly robustify things I think. But this would be a lot of work and would only really bear fruits in quite a long time :D

sphuber added 2 commits April 18, 2023 12:31

WIP

a531dd5

sphuber requested a review from mbercx May 23, 2023 08:14

sphuber mentioned this pull request May 23, 2023

fix lda_plus_u_calculation in output_parameters #936

Open

mbercx mentioned this pull request Jun 8, 2023

Revisit BaseParser when integrating into PwParser #953

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`PwParser`: Refactor to primarily parse from XML and only parse from stdout what is necessary #940

`PwParser`: Refactor to primarily parse from XML and only parse from stdout what is necessary #940

sphuber commented May 23, 2023

mbercx commented Dec 20, 2023

sphuber commented Dec 20, 2023

t-reents commented Dec 20, 2023

sphuber commented Dec 20, 2023

PwParser: Refactor to primarily parse from XML and only parse from stdout what is necessary #940

Are you sure you want to change the base?

PwParser: Refactor to primarily parse from XML and only parse from stdout what is necessary #940

Conversation

sphuber commented May 23, 2023

mbercx commented Dec 20, 2023

sphuber commented Dec 20, 2023

t-reents commented Dec 20, 2023

sphuber commented Dec 20, 2023

`PwParser`: Refactor to primarily parse from XML and only parse from stdout what is necessary #940

`PwParser`: Refactor to primarily parse from XML and only parse from stdout what is necessary #940