-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rechunk SSEBop MODIS daily data #506
Comments
I think the best process to grab the files you will be working on is to navigate into the directory where you will store the data and download them with wget. For example, the data for the year 2000 can be downloaded with There are plenty of other ways to do this as well so feel free to choose another approach if you would prefer that. |
And I'm copying the information about setting up your jupyter environment on Hovenweep here just to keep everything together:
|
These tiffs do not have timestamps stored as a dimension when you open each file. Therefore, we need to add this information to open all the datasets up using
|
One issue we have noticed with the dataset it that certain years that are not leap years have data available for 366 days of the year. Specifically, the dates |
@pnorton-usgs also noticed some issues related to the scale factor and data types that he can comment on here. |
@pnorton-usgs was able to create a zarr with one year of data (using |
@amsnyder, my understanding is |
What I found is that the documentation (https://earlywarning.usgs.gov/docs/USA_DAILYSSEBopETa_Oct2019.pdf) states that to get the actual value in millimeters the stored value should be divided by 1000, however, the TIFF only shows a scale factor of 1. Because of this the values will not be probably computed when the file is opened. I believe we can load the files with |
The data provider has updated the dataset to be chunked. This new data has some incongruent factors that complicate the chunking process. Data for years 2000-2017 and 2018-2024 can be successfully rechunked in their respective groupings but produce "non-monotonic" dimensions when combined. Work is continuing to fix this issue. |
@dbrannonUSGS, I have found the same issue in the monthly data. Here is the code snippet for how I rectified this: ds1 = ds1.sel(lon=ds2["lon"], lat=ds2["lat"], method="nearest", tolerance=1e-10)
ds1, ds2 = xr.align(ds1, ds2, join="override", exclude="time")
print(f"Maximum absolute difference in latitude: {np.abs(ds1.lat - ds2.lat).max()}")
print(f"Maximum absolute difference in longitude: {np.abs(ds1.lon - ds2.lon).max()}") FYI, a difference of even |
@pnorton-usgs could you review @dbrannonUSGS's work which has been copied into https://github.com/hytest-org/ssebop-modis-daily-to-zarr? I messed up the creation of the repo and didn't make an empty main branch - so the work is on initial_commit. If you have any suggested changes, you can just create a new branch against @dbrannonUSGS can you comment here with the location of the zarr you created so @pnorton-usgs can take a look? |
@thodson-usgs wrote a pangeo-forge recipe to rechunk this data, but it hasn't been run so we want to reproduce this in a notebook to complete the work. The recipe you will want to duplicate is here, and the data source is here.
You can download the data inputs on Hovenweep here: /caldera/hovenweep/projects/usgs/water/impd/hytest/rechunking/ssebop_modis_daily
The text was updated successfully, but these errors were encountered: