Skip to content

Commit

Permalink
doc
Browse files Browse the repository at this point in the history
  • Loading branch information
floriankrb committed May 28, 2024
1 parent b8ba110 commit 1648ee9
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 13 deletions.
20 changes: 10 additions & 10 deletions docs/cli/copy.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,21 @@ copy
====


Copying a dataset from one location to another can be error-prone and time-consuming. This command-line script allows for incremental copying.

It can be used to copy files from a local directory to a remote server, from a remote server to a local directory, or between two remote servers as long as there is a zarr backend to read and write the data.

Copying a dataset from one location to another can be error-prone and time-consuming.
This command-line script allows for incremental copying.
When the copying process fails, it can be resumed.
It can be used to copy files from a local directory to a remote server, from a remote server to a local directory as long as there is a zarr backend to read and write the data.

The script uses multiple threads to make the process faster. However, it is important to consider that making parallel requests to the same server may not be ideal, for instance if the server internally uses a limited number of threads to handle requests.

The option to rechunk the data is available, which can be useful when the data is stored on a platform that does not support having may small files or many file on the same directory. Keep in mind that rechunking has a huge impact on the performance when reading the data.
The chunk pattern for dataset has been defined for good reasons, and changing it is very likey to have a negative impact on the performance.
The script uses multiple threads to make the process faster.
However, it is important to consider that making parallel requests to the same server may not be ideal, for instance if the server internally uses a limited number of threads to handle requests.

The option to rechunk the data is available, which can be useful when the data is stored on a platform that does not support having may small files or many file on the same directory.
However keep in mind that rechunking has a huge impact on the performance when reading the data:
The chunk pattern for the source dataset has been defined for good reasons, and changing it is very likey to have a negative impact on the performance.

.. note::
.. warning::

When resuming the copying process (using `--resume`), calling the script with the same arguments for --block-size and --rechunk is recommended.
When resuming the copying process (using ``--resume``), calling the script with the same arguments for --block-size and --rechunk is recommended.
Using different values for these arguments to resume copying the same dataset may lead to unexpected behavior.


Expand Down
6 changes: 3 additions & 3 deletions docs/cli/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@ Introduction
============

When you install the `anemoi-datasets` package, this will also install command line tool
called ``anamois-datasets`` this can be used to manage the zarr datasets.
called ``anemoi-datasets`` which can be used to manage the zarr datasets.

The tools can provide help with the ``--help`` options:
The tool can provide help with the ``--help`` options:

.. code-block:: bash
% anamoi-datasets --help
% anemoi-datasets --help
The commands are:

Expand Down

0 comments on commit 1648ee9

Please sign in to comment.