Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CompositionOnly featurizer preset does not use oxidation states #46

Open
ml-evs opened this issue Apr 13, 2021 · 6 comments
Open

CompositionOnly featurizer preset does not use oxidation states #46

ml-evs opened this issue Apr 13, 2021 · 6 comments

Comments

@ml-evs
Copy link
Collaborator

ml-evs commented Apr 13, 2021

As above. We should probably add the oxidation state featurizers used in the DeBreuck2020Featurizer to the CompositionOnly featurizer too.

@ancarnevali
Copy link

Hi! Was this implemented or is it still pending?

@ppdebreuck
Copy link
Owner

Hi! Thanks for pulling this up. This hasn't been implemented but can easily be added manually.
Small example:

from modnet.featurizers.presets import DeBreuck2020Featurizer
from modnet.preprocessing import MODData
from pymatgen.core import Composition

materials = [Composition("SiO2"),Composition("Li20")]
class CompositionOxidFeaturizer(DeBreuck2020Featurizer):
    def __init__(self):
        super().__init__()
        self.structure_featurizers = ()
        self.site_featurizers = ()

data = MODData(materials=materials, featurizer=CompositionOxidFeaturizer())
data.featurize()
print(data.df_featurized)

This will add 9 extra features compared to the current default one.
Let me know if this helps. I might try this with some of the matbench tasks and implement this (maybe default).

@ml-evs
Copy link
Collaborator Author

ml-evs commented Mar 22, 2022

I will just add one extra bit to @ppdebreuck's suggestion, which is to add the fast_oxid attribute to the featurizer, otherwise pymatgen (the underlying library matminer uses for oxidation states) will try to allow for every site in a structure to have a different oxidation state which scales horribly with the number of sites. Doing something like:

class CompositionOxidFeaturizer(DeBreuck2020Featurizer):
    def __init__(self):
        super().__init__()
        self.structure_featurizers = ()
        self.site_featurizers = ()
        self.fast_oxid = True

will use more sensible defaults for oxidation state calculations that you can then test:

if self.oxid_composition_featurizers:
LOG.info("Applying oxidation state featurizers...")
if getattr(self, "fast_oxid", False):
df = CompositionToOxidComposition(
all_oxi_states=False, max_sites=-1
).featurize_dataframe(df, "composition")
else:
df = CompositionToOxidComposition().featurize_dataframe(
df, "composition"
)
)

(also see the relevant code in matminer/pymatgen)

@ancarnevali
Copy link

Thank you for your inputs!
so if I understand correctly, with this class we just add features, and the "base" featurizer is launched with data.featurize() later on.
By the way, since this would be a composition-only task, why are there self.structure_featurizers = () and self.site_featurizers = ()? Is it just needed to define them as empty?

@ml-evs
Copy link
Collaborator Author

ml-evs commented Mar 22, 2022

Thank you for your inputs! so if I understand correctly, with this class we just add features, and the "base" featurizer is launched with data.featurize() later on.

Yep, the base featurizer just knows to take each listed matminer featurizer in turn and apply it to the relevant sites, structure or composition.

By the way, since this would be a composition-only task, why are there self.structure_featurizers = () and self.site_featurizers = ()? Is it just needed to define them as empty?

Yeah, this is just convenience/laziness. You could instead inherit from the base featurizer and list out all of the matminer featurizer classes you wanted to use, but as we have them all already in the DeBreuck2020 preset it is easier base on that and disable the ones we don't want.

@ml-evs
Copy link
Collaborator Author

ml-evs commented Jan 10, 2023

At some point the IonProperty featurizer in matminer started using oxidation states (maybe it always did). We never ran into an issue with this as the poor scaling was probably being masked by the time taken for other featurizers, or we were only using it on simple compositions. As shown in #126, the featurizers basically hang if a sufficiently complex composition is attempted.

Possible workarounds:

  • Move IonProperty into the oxid_composition_featurizer list instead (so it will no longer be used unless there is a structure).
  • or allow for the selection of fast_oxid for the composition-only featurisers, as suggested above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants