deploy: a36d9fb

e-marshall · Feb 4, 2024 · 2d2981a · 2d2981a
1 parent 9bb020d
commit 2d2981a
Show file tree

Hide file tree

Showing 2 changed files with 2 additions and 2 deletions.
diff --git a/_sources/asf_local_vrt.ipynb b/_sources/asf_local_vrt.ipynb
@@ -2914,7 +2914,7 @@
    "source": [
     "### Taking a look at chunking\n",
     "\n",
-    "If you take a look at the chunking you will see that the entire object has a shape `(103, 13379, 17452)` and that each chunk is `(1, 5760, 5760)`. This breaks the full array (~ 89 GB) into 1,236 chunks that are about 127 MB each. We can also see that chunking keeps each time step intact which is optimal for time series data. If you are interested in an example of inefficient chunking, you can check out the example notebook in the [appendix]. In this case, because of the internal structure of the data and the characteristics of the time series stack, various chunking strategies produced either too few (103) or too many (317,240) chunks with complicated structures that led to memory blow-ups when trying to compute. The difficulty we encountered trying to structure the data using `xr.open_mfdataset()` led us to use the VRT approach in this notebook but `xr.open_mfdataset()` is still a very useful tool if your data is a good fit. \n",
+    "If you take a look at the chunking you will see that the entire object has a shape `(103, 13379, 17452)` and that each chunk is `(1, 5760, 5760)`. This breaks the full array (~ 89 GB) into 1,236 chunks that are about 127 MB each. We can also see that chunking keeps each time step intact which is optimal for time series data. If you are interested in an example of inefficient chunking, you can check out the example notebook in the [appendix](https://e-marshall.github.io/sentinel1_rtc/asf_local_mf.html#an-example-of-complicated-chunking). In this case, because of the internal structure of the data and the characteristics of the time series stack, various chunking strategies produced either too few (103) or too many (317,240) chunks with complicated structures that led to memory blow-ups when trying to compute. The difficulty we encountered trying to structure the data using `xr.open_mfdataset()` led us to use the VRT approach in this notebook but `xr.open_mfdataset()` is still a very useful tool if your data is a good fit. \n",
     "\n",
     "Chunking is an important aspect of how dask works. You want the chunking strategy to match the structure of the data (ie. internal tiling of the data, if your data is stored locally you want chunks to match the storage structure) without having too many chunks (this will cause unnecessary communication among workers) or too few chunks (this will lead to large chunk sizes and slower processing). There are helpful explanations [here](https://docs.dask.org/en/stable/array-best-practices.html#select-a-good-chunk-size) and [here](https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes).\n",
     "When chunking is set to `auto` (the case here), the optimal chunk size will be selected for each dimension (if specified individually) or all dimensions. Read more about chunking [here](https://docs.dask.org/en/stable/array-chunks.html)."

diff --git a/asf_local_vrt.html b/asf_local_vrt.html
@@ -3029,7 +3029,7 @@ <h2>Extract metadata<a class="headerlink" href="#extract-metadata" title="Permal
 </div>
 <section id="taking-a-look-at-chunking">
 <h3>Taking a look at chunking<a class="headerlink" href="#taking-a-look-at-chunking" title="Permalink to this heading">#</a></h3>
-<p>If you take a look at the chunking you will see that the entire object has a shape <code class="docutils literal notranslate"><span class="pre">(103,</span> <span class="pre">13379,</span> <span class="pre">17452)</span></code> and that each chunk is <code class="docutils literal notranslate"><span class="pre">(1,</span> <span class="pre">5760,</span> <span class="pre">5760)</span></code>. This breaks the full array (~ 89 GB) into 1,236 chunks that are about 127 MB each. We can also see that chunking keeps each time step intact which is optimal for time series data. If you are interested in an example of inefficient chunking, you can check out the example notebook in the [appendix]. In this case, because of the internal structure of the data and the characteristics of the time series stack, various chunking strategies produced either too few (103) or too many (317,240) chunks with complicated structures that led to memory blow-ups when trying to compute. The difficulty we encountered trying to structure the data using <code class="docutils literal notranslate"><span class="pre">xr.open_mfdataset()</span></code> led us to use the VRT approach in this notebook but <code class="docutils literal notranslate"><span class="pre">xr.open_mfdataset()</span></code> is still a very useful tool if your data is a good fit.</p>
+<p>If you take a look at the chunking you will see that the entire object has a shape <code class="docutils literal notranslate"><span class="pre">(103,</span> <span class="pre">13379,</span> <span class="pre">17452)</span></code> and that each chunk is <code class="docutils literal notranslate"><span class="pre">(1,</span> <span class="pre">5760,</span> <span class="pre">5760)</span></code>. This breaks the full array (~ 89 GB) into 1,236 chunks that are about 127 MB each. We can also see that chunking keeps each time step intact which is optimal for time series data. If you are interested in an example of inefficient chunking, you can check out the example notebook in the <a class="reference external" href="https://e-marshall.github.io/sentinel1_rtc/asf_local_mf.html#an-example-of-complicated-chunking">appendix</a>. In this case, because of the internal structure of the data and the characteristics of the time series stack, various chunking strategies produced either too few (103) or too many (317,240) chunks with complicated structures that led to memory blow-ups when trying to compute. The difficulty we encountered trying to structure the data using <code class="docutils literal notranslate"><span class="pre">xr.open_mfdataset()</span></code> led us to use the VRT approach in this notebook but <code class="docutils literal notranslate"><span class="pre">xr.open_mfdataset()</span></code> is still a very useful tool if your data is a good fit.</p>
 <p>Chunking is an important aspect of how dask works. You want the chunking strategy to match the structure of the data (ie. internal tiling of the data, if your data is stored locally you want chunks to match the storage structure) without having too many chunks (this will cause unnecessary communication among workers) or too few chunks (this will lead to large chunk sizes and slower processing). There are helpful explanations <a class="reference external" href="https://docs.dask.org/en/stable/array-best-practices.html#select-a-good-chunk-size">here</a> and <a class="reference external" href="https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes">here</a>.
 When chunking is set to <code class="docutils literal notranslate"><span class="pre">auto</span></code> (the case here), the optimal chunk size will be selected for each dimension (if specified individually) or all dimensions. Read more about chunking <a class="reference external" href="https://docs.dask.org/en/stable/array-chunks.html">here</a>.</p>
 </section>