You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a new user, in a jupyter notebook, I used the tab completion on dataretrieval as imported and found the many get_... functions including get_pmcodes. But then I was surprised that it returned a tuple including metadata.
Looking in the code, I see that get_record is a light wrapper around all these that returns only the df. All good, but it might be nice to push users toward get_record and make all the underlying functions private. Just a refactor from get_pmcodes to _get_pmcodes etc. I'm happy to refactoir, but wanted to see if this is something intentional for reasons I'm not seeing.
Alternatively, could there be a default in all the get functions to suppress returning the metadata unless requested? This would make it easier to know which **kwargs the underlying function needs.
The text was updated successfully, but these errors were encountered:
I originally used get_record, but other contributors convinced me to deprecate it (it retains its legacy behavior)
Metadata was a later addition. Nobody every liked how it was implemented, but this was fundamentally a limitation of pandas and its lack of a metadata standard.
Ideally, we'd put the metadata in pandas.DataFrame.attrs
but pandas has flagged attrs as experimental and may change without warning...
My plan was to continue to return a tuple until pandas improves its metadata or xarray natively handles ragged arrays.
But we've waited several years already, so it's good to revisit this.
You might also prefer HyRiver, which is another great collection of packages. I frequently use both. dataretrieval has a much simpler creedo, which is to do one thing well.
I'm pretty stoked on this project being simple and being supported by USGS. There are other packages out there as well but they are just complicated and I like the idea of staying focused on core functionality.
So - using the various specific get functions makes good sense too. Maybe the default idea that allows a request for the metadata but defaults to only returning the dataframe? I may be missing something, but seems like the metadata is more valuable for debugging than general use?
As a new user, in a jupyter notebook, I used the tab completion on
dataretrieval
as imported and found the manyget_...
functions includingget_pmcodes
. But then I was surprised that it returned a tuple including metadata.Looking in the code, I see that
get_record
is a light wrapper around all these that returns only thedf
. All good, but it might be nice to push users towardget_record
and make all the underlying functions private. Just a refactor fromget_pmcodes
to_get_pmcodes
etc. I'm happy to refactoir, but wanted to see if this is something intentional for reasons I'm not seeing.Alternatively, could there be a default in all the
get
functions to suppress returning the metadata unless requested? This would make it easier to know which**kwargs
the underlying function needs.The text was updated successfully, but these errors were encountered: