Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor ZairaChem modules to Ersilia Model Hub components #30

Open
8 tasks
miquelduranfrigola opened this issue Dec 5, 2023 · 2 comments
Open
8 tasks
Assignees

Comments

@miquelduranfrigola
Copy link
Member

miquelduranfrigola commented Dec 5, 2023

Motivation

At the moment, ZairaChem has a lot of dependencies, as apparent from the install.sh file. Importantly, several Conda environments are created, which makes it difficult/impossible to maintain. To make ZairaChem more sustainable, we need to migrate most of its code to Ersilia Model Hub artefacts.

Types of ZairaChem elements

There are 2 types of modules we want to migrate:

  • Static: These are the easy ones. For example, ZairaChem uses MELLODDY-Tuner to convert a list of SMILES into a normalized form, with some extra columns. In principle, we could create a new Ersilia model (called, for example, melloddy-tuner) where this is done in an isolated container/environment. This would prevent us from having to install MELLODDY and its dependencies.
  • Trainable: ZairaChem uses AutoML frameworks such as FLAML, AutoGluon, Keras Tuner and others. These frameworks are used to automatically train models based on descriptors. Ideally, we want to migrate these trainers into Ersilia Model Hub artefacts. The main challenge 😟 is that, at the moment, Ersilia does not accept fit instructions. Therefore, we would need to figure this out first. At a high level, we'd like to have fitting capabilities at training time, acompanied with some persistency of AI models in order to use them at prediction time.

Roadmap

We should start with static migration, while we figure out the approach for trainable models. I suggest the following order (subject to change):

  • Ersilia Compound Embeddings (eos2gw4)
  • MELLODDY-Tuner
  • Modify Ersilia codebase to enable fit commands. Probably, we should work on this in a separate issue.
  • Create a fittable FLAML Ersilia model.
  • Create a fittable Keras Tuner Ersilia model.
  • Create a fittable AutoGluon Ersilia model.
  • Create a fittable MolMap Ersilia model.
  • Replace existing code in ZairaChem by ErsiliaModel Python API calls.
@GemmaTuron
Copy link
Member

We will need to complete the refactoring before fully incorporating #31 @miquelduranfrigola

@miquelduranfrigola
Copy link
Member Author

Yes, all clear. Tagging @DhanshreeA since we will have to further improve input output adapters, and most important, figure out a way to make Ersilia models trainable and fine-tunable.

@DhanshreeA DhanshreeA self-assigned this May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Status: Todo
Development

No branches or pull requests

3 participants