Skip to content

Commit

Permalink
try to fix broken link
Browse files Browse the repository at this point in the history
  • Loading branch information
e-marshall committed Feb 4, 2024
1 parent 52ca7f2 commit a36d9fb
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion asf_local_vrt.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2914,7 +2914,7 @@
"source": [
"### Taking a look at chunking\n",
"\n",
"If you take a look at the chunking you will see that the entire object has a shape `(103, 13379, 17452)` and that each chunk is `(1, 5760, 5760)`. This breaks the full array (~ 89 GB) into 1,236 chunks that are about 127 MB each. We can also see that chunking keeps each time step intact which is optimal for time series data. If you are interested in an example of inefficient chunking, you can check out the example notebook in the [appendix]. In this case, because of the internal structure of the data and the characteristics of the time series stack, various chunking strategies produced either too few (103) or too many (317,240) chunks with complicated structures that led to memory blow-ups when trying to compute. The difficulty we encountered trying to structure the data using `xr.open_mfdataset()` led us to use the VRT approach in this notebook but `xr.open_mfdataset()` is still a very useful tool if your data is a good fit. \n",
"If you take a look at the chunking you will see that the entire object has a shape `(103, 13379, 17452)` and that each chunk is `(1, 5760, 5760)`. This breaks the full array (~ 89 GB) into 1,236 chunks that are about 127 MB each. We can also see that chunking keeps each time step intact which is optimal for time series data. If you are interested in an example of inefficient chunking, you can check out the example notebook in the [appendix](https://e-marshall.github.io/sentinel1_rtc/asf_local_mf.html#an-example-of-complicated-chunking). In this case, because of the internal structure of the data and the characteristics of the time series stack, various chunking strategies produced either too few (103) or too many (317,240) chunks with complicated structures that led to memory blow-ups when trying to compute. The difficulty we encountered trying to structure the data using `xr.open_mfdataset()` led us to use the VRT approach in this notebook but `xr.open_mfdataset()` is still a very useful tool if your data is a good fit. \n",
"\n",
"Chunking is an important aspect of how dask works. You want the chunking strategy to match the structure of the data (ie. internal tiling of the data, if your data is stored locally you want chunks to match the storage structure) without having too many chunks (this will cause unnecessary communication among workers) or too few chunks (this will lead to large chunk sizes and slower processing). There are helpful explanations [here](https://docs.dask.org/en/stable/array-best-practices.html#select-a-good-chunk-size) and [here](https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes).\n",
"When chunking is set to `auto` (the case here), the optimal chunk size will be selected for each dimension (if specified individually) or all dimensions. Read more about chunking [here](https://docs.dask.org/en/stable/array-chunks.html)."
Expand Down

0 comments on commit a36d9fb

Please sign in to comment.