120 model collection #122

jgallowa07 · 2023-09-28T02:27:00Z

Major Changes:

Adds initial multidms.model_collection module with multidms.fit_models for the ability to fit multiple models across a range of parameter spaces in parallel using multiprocessing. This is inspired by the polyclonal.fit_models function.
Adds the ModelCollection class for split-apply-combine interface to the mutational dataframes for a collection of models
Adds two altair plotting methods to ModelCollection. (1) mut_param_heatmap for visualizing aggregated parameter sets across fits, and (2) mut_param_traceplot making trace plots across fits with variable lasso coeff strengths

To do:

Make a notebook with descriptive example of the new interface
Convert the multidms.plot.mut_shift_plot into a method of ModelCollection for querying and visualizing aggregated groups of the collection
Make an altair trace visualization method for ModelCollection class. this could be both trace "lines" for comparison of continuous hyper-parameters like lasso penalty coeff, as well as boxplots for sets of fits with unordered hyperparameters (different non-linearities, perhaps?)
Cleanup and integrate notebook into documentation.
Squash commits and final cleanup

Questions:

The lineplot and heatmap method throws out mutational information about "times seen". It could be useful to keep this as a slider attribute in the altair chart. However, there are several problems that arise if we want this: (1) How should times seen be aggregated across datasets (sum, mean)? (2) Should there be a slider for times seen slider for each condition (experiments), or should the times seen stat be aggregated across conditions as well?

Other Minor changes:

removes utils module.
Cleans up progress bar creates dependency on ipywidgets #114
optionally removes "wts", "sites", "muts" from the mutations dataframe returned by Model.get_mutations_df. Those were unnecessary IMO
Changes the naming of columns produced by Model.get_mutations_df(), in particular, it moves the condition name for predicted func score to be a suffix (as with shift, and time_seen) rather than a prefix. e.g. "delta_predicted_func_score" -> "predicted_func_score_delta".

jgallowa07 marked this pull request as ready for review October 3, 2023 05:14

jgallowa07 force-pushed the 120_model_collection branch from d5d83be to 47d3104 Compare October 11, 2023 18:24

Add ModelCollection interface.

134bb27

jgallowa07 force-pushed the 120_model_collection branch from 8812f73 to 134bb27 Compare October 11, 2023 19:19

update CHANGELOG

16ec04d

jgallowa07 merged commit 014ac85 into main Oct 11, 2023
8 checks passed

This was referenced Oct 11, 2023

Add ModelCollection class #120

Closed

progress bar creates dependency on ipywidgets #114

Closed

jgallowa07 deleted the 120_model_collection branch October 13, 2023 17:42

Provide feedback