Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider making the functions called by get_record() private #113

Open
mnfienen opened this issue Sep 25, 2023 · 2 comments
Open

Consider making the functions called by get_record() private #113

mnfienen opened this issue Sep 25, 2023 · 2 comments

Comments

@mnfienen
Copy link
Contributor

mnfienen commented Sep 25, 2023

As a new user, in a jupyter notebook, I used the tab completion on dataretrieval as imported and found the many get_... functions including get_pmcodes. But then I was surprised that it returned a tuple including metadata.

Looking in the code, I see that get_record is a light wrapper around all these that returns only the df. All good, but it might be nice to push users toward get_record and make all the underlying functions private. Just a refactor from get_pmcodes to _get_pmcodes etc. I'm happy to refactoir, but wanted to see if this is something intentional for reasons I'm not seeing.

Alternatively, could there be a default in all the get functions to suppress returning the metadata unless requested? This would make it easier to know which **kwargs the underlying function needs.

@thodson-usgs
Copy link
Collaborator

I originally used get_record, but other contributors convinced me to deprecate it (it retains its legacy behavior)

Metadata was a later addition. Nobody every liked how it was implemented, but this was fundamentally a limitation of pandas and its lack of a metadata standard.
Ideally, we'd put the metadata in pandas.DataFrame.attrs
but pandas has flagged attrs as experimental and may change without warning...

My plan was to continue to return a tuple until pandas improves its metadata or xarray natively handles ragged arrays.
But we've waited several years already, so it's good to revisit this.

You might also prefer HyRiver, which is another great collection of packages. I frequently use both. dataretrieval has a much simpler creedo, which is to do one thing well.

@mnfienen
Copy link
Contributor Author

Right on - thanks for the info.

I'm pretty stoked on this project being simple and being supported by USGS. There are other packages out there as well but they are just complicated and I like the idea of staying focused on core functionality.

So - using the various specific get functions makes good sense too. Maybe the default idea that allows a request for the metadata but defaults to only returning the dataframe? I may be missing something, but seems like the metadata is more valuable for debugging than general use?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants