Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add accessors for stats functions #59

Open
bradyrx opened this issue Aug 30, 2019 · 9 comments
Open

Add accessors for stats functions #59

bradyrx opened this issue Aug 30, 2019 · 9 comments
Assignees

Comments

@bradyrx
Copy link
Owner

bradyrx commented Aug 30, 2019

It would be much quicker to have our stats functions registered as accessors as well, as in xskillscore. Then one can just do ds.esm.rm_trend(dim='time') for instance.

@lukegre
Copy link
Contributor

lukegre commented Aug 3, 2020

Happy to jump in here if this hasn't been taken on yet

@bradyrx
Copy link
Owner Author

bradyrx commented Aug 3, 2020

Yes that would be great @luke-gregor! I think I already have the accessor file set up for the convert_lon function, so feel free to add the stats functions there. You can follow the model set up in xskillscore. I'll look out for a PR. Thanks!

@lukegre
Copy link
Contributor

lukegre commented Nov 4, 2020

I've finally got some time to get to this! Just want to check that anyone else hasn't started on this in the meantime?

@lukegre
Copy link
Contributor

lukegre commented Nov 4, 2020

A question regarding usability. I noticed that for the stats regression functions, that x changes from the predicant to the predictor when y is given. Would it not make sense to always have y as the predicant and x as the predictor (if not None)? Seems a little more consistent/transparent.

I can create a new PR or change that in this PR if you agree on this.

esmtools/esmtools/stats.py

Lines 265 to 295 in e93391f

def linear_slope(x, y=None, dim='time', nan_policy='none'):
"""Returns the linear slope with y regressed onto x.
.. note::
This function will try to infer the time freqency of sampling if ``x`` is in
datetime units. The final slope will be returned in the original units per
that frequency (e.g. SST per year). If the frequency cannot be inferred
(e.g. because the sampling is irregular), it will return in the original
units per day (e.g. SST per day).
Args:
x (xarray object): Independent variable (predictor) for linear regression.
If ``y`` is ``None``, treat ``x`` as the dependent variable and remove
slope over ``dim``.
y (xarray object, optional): Dependent variable (predictand) for linear
regression. If ``None``, treat ``x`` as the predictand.
dim (str, optional): Dimension to apply linear regression over.
Defaults to "time".
nan_policy (str, optional): Policy to use when handling nans. Defaults to
"none".
* 'none', 'propagate': If a NaN exists anywhere on the given dimension,
return nans for that whole dimension.
* 'raise': If a NaN exists at all in the datasets, raise an error.
* 'omit', 'drop': If a NaN exists in `x` or `y`, drop that index and
compute the slope without it.
Returns:
xarray object: Slopes computed through a least-squares linear regression.
"""

@aaronspring
Copy link
Collaborator

I wanted to have the current implementation, because I wanted to do linear_slope(ds, dim='time'). I'd be happy to keep that way of using it. maybe having the accessor like ds.esmtools.liner_slope(dim='time')?

@lukegre
Copy link
Contributor

lukegre commented Nov 4, 2020

Changing this shouldn't change the functionality at all. Just means that y will always be the first input argument. Would this make sense? This is a slight API change though, so might break code for people who have already implemented linear_slope(x, y=y, dim='time')

@aaronspring
Copy link
Collaborator

Ah ok. Please go ahead with a PR. Sounds good to me

@bradyrx
Copy link
Owner Author

bradyrx commented Nov 13, 2020

@luke-gregor, my thinking was that in logical order it makes sense to have x first, then y. I know it's sort of weird to have a singular argument revert to the predictand, since it's technically in the slot order of the predictor otherwise (if I understand your point correctly).

I would say it's intuitively off to have (y, x=None) order, although the language is "y regressed onto x" so this could make sense. In the end, my defense is that I modeled it after scipy conventions: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html. I could see it going either way, but I see in the PR you got roadblocked by the thorough testing so it seems like we'll keep the order for now?

@lukegre
Copy link
Contributor

lukegre commented Nov 17, 2020

I could see it going either way, but I see in the PR you got roadblocked by the thorough testing so it seems like we'll keep the order for now?

Yep, exactly. Didn't realise how extensive and impressive the test suite for esmtools is!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants