Skip to content

Commit

Permalink
deploy: a36d9fb
Browse files Browse the repository at this point in the history
  • Loading branch information
e-marshall committed Feb 4, 2024
1 parent 9bb020d commit 2d2981a
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 2 deletions.
2 changes: 1 addition & 1 deletion _sources/asf_local_vrt.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2914,7 +2914,7 @@
"source": [
"### Taking a look at chunking\n",
"\n",
"If you take a look at the chunking you will see that the entire object has a shape `(103, 13379, 17452)` and that each chunk is `(1, 5760, 5760)`. This breaks the full array (~ 89 GB) into 1,236 chunks that are about 127 MB each. We can also see that chunking keeps each time step intact which is optimal for time series data. If you are interested in an example of inefficient chunking, you can check out the example notebook in the [appendix]. In this case, because of the internal structure of the data and the characteristics of the time series stack, various chunking strategies produced either too few (103) or too many (317,240) chunks with complicated structures that led to memory blow-ups when trying to compute. The difficulty we encountered trying to structure the data using `xr.open_mfdataset()` led us to use the VRT approach in this notebook but `xr.open_mfdataset()` is still a very useful tool if your data is a good fit. \n",
"If you take a look at the chunking you will see that the entire object has a shape `(103, 13379, 17452)` and that each chunk is `(1, 5760, 5760)`. This breaks the full array (~ 89 GB) into 1,236 chunks that are about 127 MB each. We can also see that chunking keeps each time step intact which is optimal for time series data. If you are interested in an example of inefficient chunking, you can check out the example notebook in the [appendix](https://e-marshall.github.io/sentinel1_rtc/asf_local_mf.html#an-example-of-complicated-chunking). In this case, because of the internal structure of the data and the characteristics of the time series stack, various chunking strategies produced either too few (103) or too many (317,240) chunks with complicated structures that led to memory blow-ups when trying to compute. The difficulty we encountered trying to structure the data using `xr.open_mfdataset()` led us to use the VRT approach in this notebook but `xr.open_mfdataset()` is still a very useful tool if your data is a good fit. \n",
"\n",
"Chunking is an important aspect of how dask works. You want the chunking strategy to match the structure of the data (ie. internal tiling of the data, if your data is stored locally you want chunks to match the storage structure) without having too many chunks (this will cause unnecessary communication among workers) or too few chunks (this will lead to large chunk sizes and slower processing). There are helpful explanations [here](https://docs.dask.org/en/stable/array-best-practices.html#select-a-good-chunk-size) and [here](https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes).\n",
"When chunking is set to `auto` (the case here), the optimal chunk size will be selected for each dimension (if specified individually) or all dimensions. Read more about chunking [here](https://docs.dask.org/en/stable/array-chunks.html)."
Expand Down
2 changes: 1 addition & 1 deletion asf_local_vrt.html
Original file line number Diff line number Diff line change
Expand Up @@ -3029,7 +3029,7 @@ <h2>Extract metadata<a class="headerlink" href="#extract-metadata" title="Permal
</div>
<section id="taking-a-look-at-chunking">
<h3>Taking a look at chunking<a class="headerlink" href="#taking-a-look-at-chunking" title="Permalink to this heading">#</a></h3>
<p>If you take a look at the chunking you will see that the entire object has a shape <code class="docutils literal notranslate"><span class="pre">(103,</span> <span class="pre">13379,</span> <span class="pre">17452)</span></code> and that each chunk is <code class="docutils literal notranslate"><span class="pre">(1,</span> <span class="pre">5760,</span> <span class="pre">5760)</span></code>. This breaks the full array (~ 89 GB) into 1,236 chunks that are about 127 MB each. We can also see that chunking keeps each time step intact which is optimal for time series data. If you are interested in an example of inefficient chunking, you can check out the example notebook in the [appendix]. In this case, because of the internal structure of the data and the characteristics of the time series stack, various chunking strategies produced either too few (103) or too many (317,240) chunks with complicated structures that led to memory blow-ups when trying to compute. The difficulty we encountered trying to structure the data using <code class="docutils literal notranslate"><span class="pre">xr.open_mfdataset()</span></code> led us to use the VRT approach in this notebook but <code class="docutils literal notranslate"><span class="pre">xr.open_mfdataset()</span></code> is still a very useful tool if your data is a good fit.</p>
<p>If you take a look at the chunking you will see that the entire object has a shape <code class="docutils literal notranslate"><span class="pre">(103,</span> <span class="pre">13379,</span> <span class="pre">17452)</span></code> and that each chunk is <code class="docutils literal notranslate"><span class="pre">(1,</span> <span class="pre">5760,</span> <span class="pre">5760)</span></code>. This breaks the full array (~ 89 GB) into 1,236 chunks that are about 127 MB each. We can also see that chunking keeps each time step intact which is optimal for time series data. If you are interested in an example of inefficient chunking, you can check out the example notebook in the <a class="reference external" href="https://e-marshall.github.io/sentinel1_rtc/asf_local_mf.html#an-example-of-complicated-chunking">appendix</a>. In this case, because of the internal structure of the data and the characteristics of the time series stack, various chunking strategies produced either too few (103) or too many (317,240) chunks with complicated structures that led to memory blow-ups when trying to compute. The difficulty we encountered trying to structure the data using <code class="docutils literal notranslate"><span class="pre">xr.open_mfdataset()</span></code> led us to use the VRT approach in this notebook but <code class="docutils literal notranslate"><span class="pre">xr.open_mfdataset()</span></code> is still a very useful tool if your data is a good fit.</p>
<p>Chunking is an important aspect of how dask works. You want the chunking strategy to match the structure of the data (ie. internal tiling of the data, if your data is stored locally you want chunks to match the storage structure) without having too many chunks (this will cause unnecessary communication among workers) or too few chunks (this will lead to large chunk sizes and slower processing). There are helpful explanations <a class="reference external" href="https://docs.dask.org/en/stable/array-best-practices.html#select-a-good-chunk-size">here</a> and <a class="reference external" href="https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes">here</a>.
When chunking is set to <code class="docutils literal notranslate"><span class="pre">auto</span></code> (the case here), the optimal chunk size will be selected for each dimension (if specified individually) or all dimensions. Read more about chunking <a class="reference external" href="https://docs.dask.org/en/stable/array-chunks.html">here</a>.</p>
</section>
Expand Down

0 comments on commit 2d2981a

Please sign in to comment.