User interface and better solvers for GAP fitting #305

max-veit · 2020-12-02T21:01:13Z

This PR aims to improve the user interface of the potential-fitting step, starting with fitting of SOAP-GAP potentials. In particular, it provides an interface to automate many of the common tasks involved in GAP fitting, allowing non-interactive (batch-style) fitting based on a flexible Python interface and easing the transition from QUIP.

Edit 24.06.2021: Incorporate the RKHS solver from #354 - description below:

Implement an RKHS solver for the sparse GPR problem.

Solving the GPR problem using the "normal" equation is very ill-conditioned.
This implements an alternative solver that is much better behaved.
Might be good to also include different options in terms of using solve or lstsq,
let's see where this goes - however this already works.

Main changes:

Simplify the train_gap_model function (now gaptools.fit_gap_simple()):
- Improve modularity, remove dependence on StructureManagers
- Move kernel computation out of main function
- Make regularizers more predictable by removing implicit variance normalization (the user should almost always specify the variance / function scale explicitly)
Add a script that reads a parameter file to control fit options
- Example parameter file included
Move the solver of the sparse GPR "normal" equations out of the fitting functions
- Add RKHS and QR solvers as options of the solver class

In progress or not yet implemented (wishlist):

Feature sparsification
Training with virials
Automatic variance scaling if requested

(Probably for future PRs:)

Multi-kernel (multi-SOAP) fitting
Automate uncertainty estimation (fitting with different data subsets)
Interface to automate regularizer optimization (including saving kernels)

Tasks before review:

(Gaussian Approximation Potentials)

and add a sensible default to baseline values in KRR class (in case we encounter species that weren't in the training set; this is better than failing with a KeyError)

(Update to version of CURFilter and FPSFilter from PR #265)

Get up to speed with the latest PRs (and eventually feat/gap_pred as well)

Account for name change and add capability to do FPS as well as CUR

(mainly useful for test runs on smaller machines)

I was initially intending this to work just by changing the working directory, since there are a few other files written that would be a pain to give the option to specify names for. But I think it's worth at least leaving the potential filename flexible.

And turn off computing gradients while sparsifying (they should be turned on again when computing the gradient kernel)

…feat/gaptools

This reverts commit 2923afe.

This reverts commit 574fac1.

…r""" This reverts commit d1cc4d4.

This reverts commit f68c5a1.

felixmusil · 2020-12-07T10:35:53Z

A few comments/questions to this draft:

I feel it would be beneficial to separate file saving and computation functionality, e.g. calculate_and_sparsify.
Does the WORKDIR parameter work as intended atm ?
Do you plan to integrate training with virials into gaptools ?
Do you plan on handling partial reference, e.g. some structures with energy but no forces and vice versa ?
Do you plan to handle computation of the kernel on multiple processor ?
Is this the right place to get an ase.Atoms sanitizer ? (The full periodic and no pbc cases are already around so only partial periodicity really needs to be implemented)
Putting parameters in a json (or yml) file is the right way I think.
Do you plan to integrate a cross validation in the executable ?
Does the dedicated ipi interface belong to this PR ?

I guess many of these suggestions can be offloaded to the wishlist of the next PR too.

max-veit · 2020-12-07T11:00:21Z

One at a time:

I feel it would be beneficial to separate file saving and computation functionality, e.g. `calculate_and_sparsify`.

Sure, although some of that was intended as diagnostic information or just to have a few "save points" in case the fit fails e.g. due to lack of time or memory. If you're referring specifically to this line:
https://github.com/cosmo-epfl/librascal/blob/9ae737649005fb1773a431b73db452060f8169cf/bindings/rascal/models/gaptools.py#L98
then yes, I suppose as long as that's included and easily accessible in the final model, then writing the sparse points is no longer necessary. But some of these files (e.g. writing out kernel files) will be useful for planned functionality, like regularizer optimization. Let's discuss this further once the PR is closer to being ready.

Does the WORKDIR parameter work as intended atm ?

It's read by the fit_gap.py script from the json parameter file, not from environment variables (although I guess that could also be made a possibility).

Do you plan to integrate training with virials into gaptools ?

Yes, I'll just go ahead and add that to the wish list.

Do you plan on handling partial reference, e.g. some structures with energy but no forces and vice versa ?

I suppose so, though I don't quite see the use case - usually you'll want to fit on a dataset where everything is computed at the same level of theory, and usually that also means having all structures with the same type of data (e.g. all with forces or all without). In any case, I don't think it would be too hard to implement.

Do you plan to handle computation of the kernel on multiple processor ?

That may indeed be useful, but I think it'll have to wait for another PR.

Is this the right place to get an ase.Atoms sanitizer ? (The full periodic and no pbc cases are already around so only partial periodicity really needs to be implemented)

Not really, but I don't see an easy way to automatically turn on atoms wrapping (which I think should be the default option tbh), so it was easier at the time to put in 5 lines of Python and just do it myself. For the non-periodic case we can use the rascal.neighbourlist.structure_manager.sanitize_non_periodic_structure function.

Putting parameters in a json (or yml) file is the right way I think.

Yep, this was deliberately done for compatibility with the rest of librascal, and especially the representation parameters are very clear when stored in this format.

Do you plan to integrate a cross validation in the executable ?

This would be part of the regularizer optimization, which might be better off in its own PR.

Does the dedicated ipi interface belong to this PR ?

That was a mistake, development has been moved to its own branch.

Includes some important bugfixes

Especially to get the updated sparsification utils and zundel notebook parameters for smoother automatic testing

Also swap labels to resolve #380

Stresses (i.e. cell gradients) are going to be tricky...

Also add a method to store features into lists of ASE Atoms, so that these can be used instead of rascal's internal AtomsLists Finally, selection methods now work with lists of ASE Atoms

… into feat/gaptools

(basically, this just removes kvec_generator.cc from compilation, because it was throwing mysterious bounds-checking errors for Eigen arrays. Will need to resolve properly once feat/reciprocal_space_soap is merged in)

- temporarily exclude kvec_generator from tests (kvec_generator.cc is currently incompatible with gcc11) - unpin jinja2 version, since older versions fail with latest markupsafe (pallets/markupsafe#304) and it is now compatible with latest nbsphinx (spatialaudio/nbsphinx#563)

Users can still specify it if they need it, but this is broken in multiple ways right now.

Apparently this was updated in skcosmo recently...

This was previously implemented but execution never reached there. This should therefore be considered an experimental feature until tests can be made.

max-veit · 2022-07-11T09:57:32Z

Ok, this is almost ready for review -- it has all the functionality I was planning to add here, now just needs some testing.

The main major change since the last time this was updated was to add a Kernel class that works with feature matrices explicitly, rather than rascal StructureManagers. This is of course less efficient than using the native rascal Kernel class, but it also means we can work with representations that aren't currently computed in librascal (like LODE).

Also fix an outdated import and format code

max-veit and others added 22 commits May 6, 2020 15:43

Add gaptools for easier non-interactive fitting of GAPs

f9421c0

(Gaussian Approximation Potentials)

Add fit_gap script and example parameter file

2e10da3

Fix energy baseline not serializing properly

6787706

and add a sensible default to baseline values in KRR class (in case we encounter species that weren't in the training set; this is better than failing with a KeyError)

Fix wrong datatype in atom baseline energy dict

3715777

Merge branch 'master' into feat/gaptools

1b9dfaa

(Update to version of CURFilter and FPSFilter from PR #265)

Merge remote-tracking branch 'origin/master' into feat/gaptools

3698c35

Get up to speed with the latest PRs (and eventually feat/gap_pred as well)

Update gaptools sparsification function

ac8eb1f

Account for name change and add capability to do FPS as well as CUR

Add select-subset option to gap fit script

d2514b9

(mainly useful for test runs on smaller machines)

Merge remote-tracking branch 'origin/master' into feat/gaptools

12e1b3a

Reformat Python files. It's... not terrible.

994f2eb

God damn it #294

f5eccae

Make fit_gap.py executable as script

73f98ac

Add option for fit_gap.py to fit without gradients

154ee47

And turn off computing gradients while sparsifying (they should be turned on again when computing the gradient kernel)

Nevermind, you do need gradients for the source points (eventually)

3297de1

Created an empty (and broken) i-PI calculator

f68c5a1

Merge branch 'feat/gaptools' of github.com:cosmo-epfl/librascal into …

26e8eef

…feat/gaptools

Added instructions for making a ipi calculator

2923afe

Revert "Added instructions for making a ipi calculator"

574fac1

This reverts commit 2923afe.

Revert "Revert "Added instructions for making a ipi calculator""

d1cc4d4

This reverts commit 574fac1.

Revert "Revert "Revert "Added instructions for making a ipi calculato…

f59392f

…r""" This reverts commit d1cc4d4.

Revert "Created an empty (and broken) i-PI calculator"

9ae7376

This reverts commit f68c5a1.

max-veit added 3 commits December 11, 2020 13:51

Add annotations (with basic units) to gap model output file

604535f

Remove another loop-inducing __init__ import

a9c1f3c

Fix broken srftime() call

768527b

max-veit mentioned this pull request Mar 25, 2021

i-PI integration #307

Merged

11 tasks

max-veit added 2 commits March 25, 2021 12:37

Merge branch 'master' into feat/gaptools

14ed599

Includes some important bugfixes

Make gap_fit respect working directory, now save final model outside it

2fb933e

ceriottm and others added 3 commits September 5, 2021 16:09

Pretty...

fa65a5d

Steps towards sklearn-ifiying SparseGPRSolver

796a122

Merge remote-tracking branch 'origin/master' into feat/gaptools

776b7ca

agoscinski mentioned this pull request Dec 6, 2021

Package name rascal is already taken on PyPI :( #362

Open

max-veit added 8 commits January 20, 2022 19:03

Eliminate duplicate function and update fit_gap script

ac2329e

Merge remote-tracking branch 'origin/master' into feat/gaptools

1feb82d

Especially to get the updated sparsification utils and zundel notebook parameters for smoother automatic testing

Fix zundel example notebook

3fd08e4

Also swap labels to resolve #380

Start a minimal KRR wrapper independent of librascal reps

3d4506a

Start implementation of direct matrix kernel emulating the old interface

abd1740

Begin making Kernel emulator compatible with KRR class

33a8e7b

Stresses (i.e. cell gradients) are going to be tricky...

Resolve variable name ambiguity in fit_gap_simple()

5dd9ff7

Add capability for structure-wise kernels to KernelDirect

23aa30c

Also add a method to store features into lists of ASE Atoms, so that these can be used instead of rascal's internal AtomsLists Finally, selection methods now work with lists of ASE Atoms

max-veit mentioned this pull request Jun 1, 2022

Is it possible to train and predict a per-atom property? [question] #407

Closed

max-veit and others added 12 commits June 3, 2022 17:57

Trial implementation of gradient kernel with explicit features

e8bb6d2

Merge branch 'feat/gaptools' of https://github.com/cosmo-epfl/librascal…

59974d3

… into feat/gaptools

Enable using Filter utilities on (wrapped) ASE Atoms lists

646177e

Temp-fix compilation on gcc 11

e2377da

(basically, this just removes kvec_generator.cc from compilation, because it was throwing mysterious bounds-checking errors for Eigen arrays. Will need to resolve properly once feat/reciprocal_space_soap is merged in)

...and fix the Python formatter, which also broke somehow?

ea9d108

Some reformatting due to newer Black version

93ec13d

Remove -march=native from default flags

0246b94

Users can still specify it if they need it, but this is broken in multiple ways right now.

Update Filter test with new minimum number of features

95aedf9

Apparently this was updated in skcosmo recently...

Another reformat due to Black version bump

d6f384b

Update example notebook with parameter name change

aac9643

Enable gradient kernel with explicit features (and their gradients)

767b57b

This was previously implemented but execution never reached there. This should therefore be considered an experimental feature until tests can be made.

max-veit marked this pull request as ready for review July 11, 2022 09:55

max-veit requested a review from PicoCentauri August 26, 2022 15:11

max-veit added 2 commits August 26, 2022 17:45

Complete feature-sparsification function for plain feature matrices

5a8a5a6

Feature-matrix selection utility now working

06679aa

Also fix an outdated import and format code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User interface and better solvers for GAP fitting #305

User interface and better solvers for GAP fitting #305

max-veit commented Dec 2, 2020 •

edited

Loading

felixmusil commented Dec 7, 2020

max-veit commented Dec 7, 2020 •

edited

Loading

max-veit commented Jul 11, 2022

User interface and better solvers for GAP fitting #305

Are you sure you want to change the base?

User interface and better solvers for GAP fitting #305

Conversation

max-veit commented Dec 2, 2020 • edited Loading

felixmusil commented Dec 7, 2020

max-veit commented Dec 7, 2020 • edited Loading

max-veit commented Jul 11, 2022

max-veit commented Dec 2, 2020 •

edited

Loading

max-veit commented Dec 7, 2020 •

edited

Loading