Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

120 model collection #122

Merged
merged 2 commits into from
Oct 11, 2023
Merged

120 model collection #122

merged 2 commits into from
Oct 11, 2023

Conversation

jgallowa07
Copy link
Member

@jgallowa07 jgallowa07 commented Sep 28, 2023

Major Changes:

  • Adds initial multidms.model_collection module with multidms.fit_models for the ability to fit multiple models across a range of parameter spaces in parallel using multiprocessing. This is inspired by the polyclonal.fit_models function.
  • Adds the ModelCollection class for split-apply-combine interface to the mutational dataframes for a collection of models
  • Adds two altair plotting methods to ModelCollection. (1) mut_param_heatmap for visualizing aggregated parameter sets across fits, and (2) mut_param_traceplot making trace plots across fits with variable lasso coeff strengths

To do:

  • Make a notebook with descriptive example of the new interface
  • Convert the multidms.plot.mut_shift_plot into a method of ModelCollection for querying and visualizing aggregated groups of the collection
  • Make an altair trace visualization method for ModelCollection class. this could be both trace "lines" for comparison of continuous hyper-parameters like lasso penalty coeff, as well as boxplots for sets of fits with unordered hyperparameters (different non-linearities, perhaps?)
  • Cleanup and integrate notebook into documentation.
  • Squash commits and final cleanup

Questions:

  1. The lineplot and heatmap method throws out mutational information about "times seen". It could be useful to keep this as a slider attribute in the altair chart. However, there are several problems that arise if we want this: (1) How should times seen be aggregated across datasets (sum, mean)? (2) Should there be a slider for times seen slider for each condition (experiments), or should the times seen stat be aggregated across conditions as well?

Other Minor changes:

  • removes utils module.
  • Cleans up progress bar creates dependency on ipywidgets #114
  • optionally removes "wts", "sites", "muts" from the mutations dataframe returned by Model.get_mutations_df. Those were unnecessary IMO
  • Changes the naming of columns produced by Model.get_mutations_df(), in particular, it moves the condition name for predicted func score to be a suffix (as with shift, and time_seen) rather than a prefix. e.g. "delta_predicted_func_score" -> "predicted_func_score_delta".

@jgallowa07 jgallowa07 marked this pull request as ready for review October 3, 2023 05:14
@jgallowa07 jgallowa07 merged commit 014ac85 into main Oct 11, 2023
8 checks passed
@jgallowa07 jgallowa07 deleted the 120_model_collection branch October 13, 2023 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant