Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source-wise interpolation #46

Merged
merged 14 commits into from
Nov 8, 2024
Merged

Source-wise interpolation #46

merged 14 commits into from
Nov 8, 2024

Conversation

hammannr
Copy link
Collaborator

If I have two sources a and b in a model, each with two shape parameters s_a0, s_a1 and s_b0, s_b1, the interpolation of pdfs and expectation values currently happens on a 4D grid (s_a0, s_a1, s_b0, s_b1). This can quickly and unnecessarily blow up the required memory.

This PR adds source-wise interpolation. For this, we keep track of which source listens to which of the shape parameters and can thus interpolate on two 2D grids (s_a0, s_a1) and (s_b0, s_b1).

For now, I added this as an option so the default is unchanged.
So far, everything I tested seemed to work but I'll do more thorough checks in the next days.

@hammannr
Copy link
Collaborator Author

Ok it seems like I broke something -- very good to have unit tests! I'll fix it tomorrow.

@hammannr
Copy link
Collaborator Author

The reason for the now failing tests is that I moved the expected_events property to the source, which I think makes sense. There, I compute the number similar to how it was done before but using the source's config entry of livetime_days and rate_multiplier instead of the model's config entries. I think in any case it should not be recommended changing the config values after init. Instead, I'd understand the source.expected_events property as well as the model.expected_events() method as default values -- this is also how it's treated in the likelihood. If you call it with a different value of livetime_days it is simply scaled by this value.
So my suggestion would be to adapt the unittest to match the new definition. Another solution would be to make source.expected_events a method with arguments livetime_days and rate_multiplier -- this way we could leave the current behaviour. What do you think @JelleAalbers @kdund ?

@hammannr hammannr marked this pull request as ready for review October 11, 2024 12:56
@hammannr
Copy link
Collaborator Author

So far, I mainly focused on the time and memory the likelihood needs to be initialized. I now noticed that running a fit with the source-wise interpolation is unfortunately even slightly slower than the previous implementation. The way I wrote the ps_interpolator I should not be too surprised, I think every python developer would punch me for this code. I tried improving it this afternoon but didn't succeed yet. I will continue trying tomorrow but if anyone has a good suggestion that would be highly appreciated! 😊

@hammannr
Copy link
Collaborator Author

After some more profiling I now think that the likelihood calls are not the bottleneck (I also couldn't find a way to speed this up in a meaningful way). Instead, I suspect minuit just doesn't love having many of these shape parameters, which make the likelihood unsmooth.

for sn, shape_parameters in self.source_shape_parameters.items():
self.source_morphers[sn] = MORPHERS[self.config['morpher']](
self.config.get('morpher_config', {}),
shape_parameters
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost certainly overkill, but for readability, perhaps we can make a tiny minimorpher = lambda shape_parameters: MORPHERS[self.config['morpher']]( self.config.get('morpher_config', {}), shape_parameters) ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea -- done! (though I made a proper def to make flake8 happy 😊

@hammannr
Copy link
Collaborator Author

hammannr commented Nov 7, 2024

@kdund, @JelleAalbers I know you're both very busy. We would need to resolve the missing unittest but apart from this, I think this could be merged. Should I try to find someone else to have a look at the changes to the code so that we can proceed?

@JelleAalbers
Copy link
Owner

Hi Robert, for me it would take quite a bit of time to import blueice in my head again, and I can't promise to do it soon. If you know of someone else who is currently using blueice I'm happy to give them access (if they don't have it already).

@kdund
Copy link
Collaborator

kdund commented Nov 7, 2024 via email

@hammannr
Copy link
Collaborator Author

hammannr commented Nov 7, 2024

Thanks a lot @kdund , that would be perfect! 😊

Copy link
Collaborator

@kdund kdund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently failing with

============================================ FAILURES =============================================
___________________________________________ test_rates ____________________________________________

    def test_rates():
        m = Model(conf_for_test(n_sources=1))
        np.testing.assert_array_equal(m.expected_events(), np.array([1000]))

        m.config['livetime_days'] = 2
>       np.testing.assert_array_equal(m.expected_events(), np.array([2000]))

tests/test_model.py:9:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

args = (<built-in function eq>, array([1000.]), array([2000]))
kwds = {'err_msg': '', 'header': 'Arrays are not equal', 'strict': False, 'verbose': True}

    @wraps(func)
    def inner(*args, **kwds):
        with self._recreate_cm():
>           return func(*args, **kwds)
E           AssertionError:
E           Arrays are not equal
E
E           Mismatched elements: 1 / 1 (100%)
E           Max absolute difference: 1000.
E           Max relative difference: 0.5
E            x: array([1000.])
E            y: array([2000])

../../../opt/miniconda3/envs/myenv/lib/python3.8/contextlib.py:75: AssertionError
======================================== warnings summary =========================================
tests/test_inference.py::test_fit_scipy
tests/test_inference.py::test_fit_scipy
tests/test_inference.py::test_fit_scipy
tests/test_inference.py::test_fit_scipy
  /Users/kdund/opt/miniconda3/envs/myenv/lib/python3.8/site-packages/scipy/optimize/_numdiff.py:576: RuntimeWarning: invalid value encountered in subtract
    df = fun(x) - f0

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================== short test summary info =====================================
FAILED tests/test_model.py::test_rates - AssertionError:

self.n_model_events_interpolator = self.morpher.make_interpolator(f=lambda m: m.pmf_grids()[1],
extra_dims=list(self.ps.shape),
anchor_models=self.anchor_models)
if self.source_wise_interpolation:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readability, might it be better to just have "if self.source_wise_interpolation: raise error" and then have the rest not in an else block but just normal indentation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes -- I basically wanted to use this as a placeholder if someone is eager to implement this at one point but of course you're right that if the if clause raises an error we don't need the else -- removed! 👍

def _get_shape_indices(self, source_name):
"""Return the indices of the shape parameters used by the source."""
shape_keys = self.source_shape_parameters[source_name].keys()
return [i for i, k in enumerate(self.shape_parameters.keys()) if k in shape_keys]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for arrays should we return it sorted? Or return a set?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I misunderstand your suggestion but I think the return should by this definition be a sorted list, or not? Also entries should be unique.

"""
if s is None:
return np.array([self.expected_events(s) for s in self.sources])
return s.events_per_day * self.config['livetime_days'] * s.fraction_in_range * s.config['rate_multiplier']
return s.expected_events
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we loosing lifetime scaling here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answer: yes, @hammannr had told me but I overlooked it-- what changes is that the expected_events is now per-source and not adapted to change on-the-go (you would do that by changing livetime instead)
I approve of that solution.

@hammannr
Copy link
Collaborator Author

hammannr commented Nov 8, 2024

I now adapted the unittest so that the config of the source is overwritten. This is a bit of an antipattern but I think it's still useful to check that this works.

Copy link
Collaborator

@kdund kdund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Robert-- in my opinion this may be merged now

@hammannr
Copy link
Collaborator Author

hammannr commented Nov 8, 2024

Thanks a lot for the review @kdund! 😊 -- will then merge

@hammannr hammannr merged commit 0206fd9 into master Nov 8, 2024
2 checks passed
@hammannr hammannr deleted the source_wise_interpolation branch November 8, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move shape parameter grids, interpolation to be per source
3 participants