How to avoid duplicated data due to bnds when converting nc to a dataframe? #117

JairoVS · 2024-08-09T07:44:04Z

JairoVS
Aug 9, 2024

Hello Robert! I am converting subsetted files from netCDF to dataframes and then to csv using nctoolkit. After the conversion, I have duplicated data in the csv. The CSV file has a "bnds" column with two values 0,1 which are the duplicates. I read the manual and found out that I could use ds.strip_variables to delete the "bnds" before converting to dataframe, but when applying the method, I got an error saying that "bnds is not a valid variable!".

The way I am using the method is
ds.strip_variables(["bnds"])

I have also tried
ds.strip_variables(vars="bands")

But they did not work. Could you give me some guidance on how I could solve this issue?

I appreciate your help!

Answered by robertjwilson

Aug 9, 2024

You will need to remove these manually using pandas. So something like:

df = ds.to_dataframe().reset_index()
df = df.drop(columns = "bnds").drop_duplicates()

I don't have the data, so I'm not sure what the bnds refer to. But for ocean data, this is often the maximum and minimum depth for a particular cell. It is is data associated with coordinates, not a data varaible. This kind of information is sometimes useful, so nctoolkit keeps it in the output from to_dataframe. An example is when you can calculate the cell height from the bnds, and need that later on. Though, if you are getting 0s and 1s then there is probably not meaningful information in it.

However, to_dataframe probably should…

View full answer

robertjwilson · 2024-08-09T07:52:08Z

robertjwilson
Aug 9, 2024
Maintainer

You will need to remove these manually using pandas. So something like:

df = ds.to_dataframe().reset_index()
df = df.drop(columns = "bnds").drop_duplicates()

I don't have the data, so I'm not sure what the bnds refer to. But for ocean data, this is often the maximum and minimum depth for a particular cell. It is is data associated with coordinates, not a data varaible. This kind of information is sometimes useful, so nctoolkit keeps it in the output from to_dataframe. An example is when you can calculate the cell height from the bnds, and need that later on. Though, if you are getting 0s and 1s then there is probably not meaningful information in it.

However, to_dataframe probably should have an option to remove bnds or perhaps to remove them by default. I'll put that on my to-do list.

1 reply

JairoVS Aug 13, 2024
Author

Thank you Robert for your help! I am attaching one nc file of the dataset I am using. The data is gridded data associated with coordinates, so similarly to the ocean data, I guess the "bands" come from the coordinates. Your suggestion of dropping the columns and duplicates with pandas worked. I agree it would be useful if you could add an option to remove "bdns" with nctoolkit. Thank you again!

GLDAS_NOAH025_M.A200501.021.zip

robertjwilson · 2024-08-13T08:08:30Z

robertjwilson
Aug 13, 2024
Maintainer

In this case the bounds relate to time. So, there is a time coordates, but there is a time_bnds variable associated with the time coordinate. Typically, these will indicate what time period the time actually covers. time itself can only be a single time, so in some instances the time bounds can be useful. For example, if it's monthly data they can tell you the data covers the start to the end of the month, and this can then be used for annual averaging etc.

Though, in this case it looks like it can be ignored, as it looks like daily data and the time is that day.

1 reply

JairoVS Aug 14, 2024
Author

Thank you Robert! It is actually monthly data but the time is set as the first day of every month.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to avoid duplicated data due to bnds when converting nc to a dataframe? #117

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to avoid duplicated data due to bnds when converting nc to a dataframe? #117

JairoVS Aug 9, 2024

Replies: 2 comments · 2 replies

robertjwilson Aug 9, 2024 Maintainer

JairoVS Aug 13, 2024 Author

robertjwilson Aug 13, 2024 Maintainer

JairoVS Aug 14, 2024 Author

JairoVS
Aug 9, 2024

Replies: 2 comments 2 replies

robertjwilson
Aug 9, 2024
Maintainer

JairoVS Aug 13, 2024
Author

robertjwilson
Aug 13, 2024
Maintainer

JairoVS Aug 14, 2024
Author