Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/implement daily bcsd #28

Merged
merged 39 commits into from
Nov 5, 2020

Conversation

dgergel
Copy link
Contributor

@dgergel dgergel commented Jun 25, 2020

adds NASA-NEX modification to classic BCSD bias correction method with quantile mapping by day groups rather than by months, uses a new custom PaddedDOYGrouper to do this. To fit quantile maps by day group rather than month, the model is fit with the flag time_grouper='daily_nasa-nex', e.g.

model_nasanex = BcsdTemperature(time_grouper='daily_nasa-nex', return_anoms=False).fit(X_train, y_train)

This currently doesn't support leap days, that's a future to-do.

ref #27

@dgergel
Copy link
Contributor Author

dgergel commented Jul 29, 2020

@jhamman - would be awesome to get a review from you on this PR when you get a chance.

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgergel - thank you for your patience here. This is a great start and I'm generally a fan of how you've done things. I've added some inline comments below.

I'd love to see a few tests that exercise the NEX option and the PaddedDOYGrouper. We'll also want to added the groupers classes to the API docs.



def MONTH_GROUPER(x):
return x.month


def DAY_GROUPER(x):
return x.day
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's move the DAY_GROUPER and MONTH_GROUPER functions to the groupers module.

def check_datetime_index(obj, timestep):
""" helper function to check datetime index for compatibility
"""
if isinstance(obj, pd.DataFrame):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what should happen when obj is not a DataFrame?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually don't think we really need this function - I didn't end up using it in the NASA-NEX daily implementation and I think it's redundant with the testing that is in place now. seem reasonable @jhamman?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I don't think we need this anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool - just took this out.

self.df = df
self.offset = offset
self.max = 365
self.days_of_year = np.arange(1, 366)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which calendars will this support? I'm thinking it will be useful to check that the calendar of df.index is valid for this grouper method.

sec_half = self.days_of_year_wrapped[self.n + self.offset : i + total_days]
all_days = np.concatenate((first_half, np.array([self.n]), sec_half), axis=0)

assert len(set(all_days)) == total_days, all_days
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's convert this to a proper ValueError:

if len(set(all_days)) != total_days:
    raise ValueError('...say something meaningful...')

list_result = []
for key, group in self:
list_result.append(group.mean().values[0])
result = pd.Series(list_result, index=self.days_of_year)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be a DataFrame. In Pandas, df.groupby(lambda x: x.month).mean() -> pd.DataFrame if df is a DataFrame.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should actually be a Series? It seems like if I make it a DataFrame, the time index gets wonky

Copy link
Contributor Author

@dgergel dgergel Sep 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh nm, got this to work along with some other updates for your comment below

Comment on lines 45 to 47
list_result = []
for key, group in self:
list_result.append(group.mean().values[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be slightly faster if you allocate a numpy array via np.full and populate the values as result[key] = group.mean().values[0]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks!

@jhamman
Copy link
Member

jhamman commented Sep 24, 2020

@dgergel - anything I can do to help get this finished up?

@dgergel
Copy link
Contributor Author

dgergel commented Sep 24, 2020

@jhamman great timing - I'm working on these edits now and hoping to tie this up soon as well.

For your first comment about tests, were you thinking along the lines of unit tests for the PaddedDOYGrouper and NASA-NEX option or some figures that compare it to grouping by month..or both?

@jhamman
Copy link
Member

jhamman commented Sep 25, 2020

For your first comment about tests, were you thinking along the lines of unit tests for the PaddedDOYGrouper and NASA-NEX option or some figures that compare it to grouping by month..or both?

I was just thinking unit tests at this stage.

@dgergel
Copy link
Contributor Author

dgergel commented Oct 14, 2020

@jhamman just finally got around to adding some unit tests and addressed your other comments as well. Looking forward to hearing what you think. Also note that I added in support for leap days into the nasa-nex option.

@dgergel dgergel mentioned this pull request Nov 3, 2020
@dgergel
Copy link
Contributor Author

dgergel commented Nov 3, 2020

@jhamman I think this should be ready to merge unless you see anything else that you think needs additional mods.

@jhamman jhamman merged commit e1981ae into pangeo-data:master Nov 5, 2020
@jhamman
Copy link
Member

jhamman commented Nov 5, 2020

Thanks @dgergel! Onward!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants