Skip to content

Commit

Permalink
add toolbox intro to module 1 (#56)
Browse files Browse the repository at this point in the history
* Update mkdocs.yml

Changed repo_url to url of the fork

* add toolbox intro documents

* add draft for geopython quickstart tutorial

* add toolbox intro to mkdocs.yml

* update mkdocs.yml

* add tutorials for geopandas, rasterio and (basic) xarray

* add tutorials for geopandas, rasterio and xarray to mkdocs.yml

* revise conda.md

* revise jupyter.ipynb

* add dissolve example, add custom maps example

* extend rasterio tutorial by examples for plotting and multi-band handling

* conclude xarray tutorial

* format link

* add figure illustrating (non-)stationarity

* correct figure alt description

* geopandas dissolve example: provide dissolve column via dictionary to aggfunc

* update STAC endpoint and collection

* update and clean theme 2 main document

* add subheading

* update input data path

* remove single-image part of the tutorial (now part of the toolbox intro)

* Add toolbox figures (snakes, rasterio, xarray)

* Fix small issues (broken links and redundant code)

* Replace figure (snakes)

* Delete caption

* Rename figure

* Update conda.md

* Add Unix commands

* correct link to geopython

* minor edits in toolbox intro

* minor edits in toolbox intro

* move toolbox intro to module 1

* delete deprecated geopython.ipynb file

* move code cell figure and fix its link

* update link to geopandas.ipynb

* update link to geopandas.ipynb

* edit m1 overview for toolbox intro integration into m1

* link from software_python to m1 toolbox intro

* update link in M1T1 xarray notebook

* revise toolbox intro structure: move all parts to same toc level

* Update mkdocs.yml

Change site url from fork to main
  • Loading branch information
a-mayr authored May 29, 2024
1 parent 8074f66 commit 3fe0f00
Show file tree
Hide file tree
Showing 15 changed files with 2,789 additions and 10 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@
"source": [
"## Loading many files of an image time series\n",
"\n",
"We take the set of Sentinel-2 images stored locally as GeoTIFFs (after you downloaded them as the folder `s2` from the course data repository). Using [rioxarray](https://corteva.github.io/rioxarray/stable/index.html), we load all these images to a DataArray and construct a DataSet from them. We follow more or less the approaches described [here](https://docs.dea.ga.gov.au/notebooks/Frequently_used_code/Opening_GeoTIFFs_NetCDFs.html#Multiple-GeoTIFFs) and [here](https://medium.com/@bonnefond.virginie/handling-multi-temporal-satellite-images-with-xarray-30d142d3391)."
"We take the set of Sentinel-2 images stored locally as GeoTIFFs (after you downloaded them as the folder `s2` from the course data repository). Using [rioxarray](https://corteva.github.io/rioxarray/stable/index.html), we load all these images to a DataArray and construct a DataSet from them. We follow more or less the approaches described [here](https://knowledge.dea.ga.gov.au/notebooks/How_to_guides/Opening_GeoTIFFs_NetCDFs/#Multiple-GeoTIFFs) and [here](https://medium.com/@bonnefond.virginie/handling-multi-temporal-satellite-images-with-xarray-30d142d3391)."
]
},
{
Expand Down
24 changes: 17 additions & 7 deletions course/module1/module1.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,13 +44,20 @@ The following skills and background knowledge are required for this module:

## Software

For the practical parts of this module (excercises and tutorials), you will need the software listed below. Follow the links to the individual software or tools, for help in setting them up.
For the practical parts of this module (excercises and tutorials), you will need:

* [QGIS](../software/software_qgis.md)
* [Python](../software/software_python.md)
* To install the packages needed for the tutorials and excercises we recommend the package and environment management system [Conda](https://docs.conda.io/en/latest/)
* You can use the `etrainee_m1.yml` file to install the packages listed therein into a fresh Python environment. The yaml file can be downloaded here: <a href=../assets/python_envs/etrainee_m1.yml download>etrainee_m1.yml</a>.
* To reproduce the examples that are using [Google Earth Engine](https://earthengine.google.com/), a registered user account for this service is required (create one [here](https://earthengine.google.com/signup/), if you don't have one).
* [QGIS](../software/software_qgis.md) - In some of the Module 1 excercises, the graphical user interface of QGIS is used for visualization of data or for digitizing polygons (used to label training samples).
* [Google Earth Engine](https://earthengine.google.com/) - For several tutorials and excercises of Module 1, a registered user account for this service is required (create one [here](https://earthengine.google.com/signup/), if you don't have one).
* [Python](../software/software_python.md) - You can use the package and environment management system [Conda](https://docs.conda.io/en/latest/) and the `etrainee_m1.yml` file to install the packages needed for the tutorials and excercises into a fresh Python environment. The yaml file can be downloaded here: <a href=../assets/python_envs/etrainee_m1.yml download>etrainee_m1.yml</a>.


## Toolbox intro

Before you start with [Theme 1](./01_principles_of_remote_sensing_time_series/01_principles_of_remote_sensing_time_series.md) of Module 1, we recommend that you go through the **[Toolbox intro](./toolbox_intro/ETRAINEE_intro_overview.md)**. There, you will learn how to

* set up your working environment with **Python** and all required packages
* create, modify and run interactive **Jupyter Notebooks** containing Python code
* use Python for basic processing steps and visualization of **geospatial data**


## Practical parts of this module (overview)
Expand Down Expand Up @@ -100,4 +107,7 @@ Copernicus Sentinel data courtesy of the [European Space Agency - ESA](https://w


### Start the module
... by proceeding to the first theme on [Principles of remote sensing time series](01_principles_of_remote_sensing_time_series/01_principles_of_remote_sensing_time_series.md).

... by proceeding to the [Toolbox intro](./toolbox_intro/ETRAINEE_intro_overview.md) or (if you are already familiar with Conda, Jupyter Notebooks and GeoPython) skip this and

... go directly to the first theme on [Principles of remote sensing time series](01_principles_of_remote_sensing_time_series/01_principles_of_remote_sensing_time_series.md).
35 changes: 35 additions & 0 deletions course/module1/toolbox_intro/ETRAINEE_intro_overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
title: "E-TRAINEE Toolbox overview"
description: "This is an overview for the E-TRAINEE Module 1 toolbox introduction (software guide)."
dateCreated: 2023-11-06
authors: Andreas Mayr
contributors: TBA
estimatedTime: 5.0 hrs (for the entire toolbox intro)
---

# Toolbox overview

Before starting the course let's "unpack our toolbox" to ensure that we have the necessary digital working environment ready. This means we introduce the different **software components used in E-TRAINEE Module 1** (and partly also in other Modules). Thereby, we cover installation and some basic methods of

* [Visual Studio Code](./vscode.md) (code editor)
* [Conda](./conda.md) (package management system)
* [Jupyter Notebooks](./jupyter.ipynb) (interactive computing), with some Python fundamentals (programming language)
* GeoPython - A quickstart to geographic data handling in Python: Vector data with [GeoPandas](./geopandas.ipynb), raster data with [rasterio](./rasterio.ipynb), and multidimensional raster data with [xarray](./xarray.ipynb).

## Background and objective
<!--*SHORTEN OR SKIP THIS!*-->

The **objective** of this introduction to the E-TRAINEE course is to provide you with the software knowledge and skills needed to start with the practical parts of E-TRAINEE Module 1. For working on these practical parts, there are often a couple of similar software tools and varieties that may be suitable for a specific task, e.g. (our choice in bold):

* Program code can be run by executing either "normal" scripts (e.g. ```*.py``` files) or code cells within interactive **Jupyter Notebooks** (```*.ipynb```) with text explanations as well as graphics and other output in between.
* Jupyter Notebook documents can be edited and run in various web-based or desktop applications, such as [JupyterLab](https://jupyterlab.readthedocs.io/en/latest/), [JupyterHub](https://jupyterhub.readthedocs.io/en/latest), [Jupyter Desktop](https://github.com/jupyterlab/jupyterlab-desktop), or **Visual Studio Code**, etc..
* Data can be processed with a variety of graphical user interface software or command line tools or by scripting in a programming language such as R, **Python**, JavaScript, or Julia.
* GeoPython: For handling geographic data in Python, we focus on the packages **GeoPandas**, **rasterio** and **xarray**. There are other packages available but some of them introduce unnecessary complexity, are no longer well-maintained, or are tailored to rather specific tasks.

Out of this variety, we *selected a set of tools* that makes up a tested and proven *digital working environment*. We hope that, by using a such a recommended, uniform environment for all course participants, you will (i) encounter less software-related problems, (ii) get more helpful, more specific instructions, (iii) learn an approach that enables you to set up and customize your working environment also for follow-up tasks (such as your MSc thesis, where you may need to install additional or different packages). A concise, step-by-step guide will show you how to make the components of this environment interact, so you don't need to search the endless resources of the web, and you have a condensed resource to look things up in case you forget something.

In addition to setting up such a working environment, you will learn some of the most useful methods of geodata handling and visualization in Python. This will be helpful when working on the practical parts of E-TRAINEE Module 1, where many of the more advanced workflows build upon these tools and methods.

### Next: VSCode

Get started with the *Visual Studio Code* source-code editor [here](./vscode.md)
145 changes: 145 additions & 0 deletions course/module1/toolbox_intro/conda.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
---
title: "Conda"
description: "This quick guide to package management with Conda is part of the introduction to the E-TRAINEE course which sets the prerequisites for starting with Module 1."
dateCreated: 2023-11-06
authors: Andreas Mayr
contributors: TBA
estimatedTime: 30 minutes
---

# Conda

To install the general-purpose programming language Python and to manage its versions along with all packages extending it (make it useful for our specific purposes) we use the package management system [Conda](https://conda.io/). Conda quickly installs, runs, and updates packages and their dependencies. Conda easily creates, saves, loads, and switches between environments on your local computer. It was created for Python programs but it can package and distribute software for any language.

<p align="center">
<img src="toolbox_media/snakes_v01.png" title="Figure created partly with DALL-E 3 / CC BY 4.0" width="300">
</p>

## Conda distributions

*Anaconda* is a popular distribution of Conda, which has lots of software and extensions preinstalled (e.g., Python with numerous packages, ...). While this may sound nice you will probably not need all this software and, instead, you might want to install other packages or specific versions.

*Miniconda* is a free minimal installer that includes only conda, Python, the packages they both depend on, and a small number of other useful packages. More packages can be installed from thousands of packages available by default in Anaconda’s public repo, or from other channels, like conda-forge or bioconda.

*Miniforge* is comparable to Miniconda but it has [conda-forge](https://conda-forge.org/) as the default channel to install packages from and it has Mamba installed in the base environment (more on this later).

## Should I install Miniforge?

* If you have anaconda or miniconda or miniforge already installed and it works for you: Just keep your installation.
* If you encounter problems with an existing installation, uninstall and install Miniforge as described below.
* If you do not yet have a Conda distribution installed, we recommend Miniforge.

## Miniforge installation

Go to the [Miniforge download website](https://github.com/conda-forge/miniforge#miniforge3) and download the installer fitting your operating system (Windows, Linux, Mac OS and) and architecture (most probably `x86_64` is the right one).

* On **Windows** just download and execute the installer manually (double-click the `.exe` file), follow the instructions on the screen and accept the default settings (recommendations). Avoid installing in a directory with special characters and spaces in the name.
* If you are on **Linux or Mac Os**, follow the procedure described [here](https://github.com/conda-forge/miniforge#unix-like-platforms-mac-os--linux).


## Manage packages and environments

Open the Miniforge prompt (in Windows type "miniforge prompt" into the search bar, then add this app to the task bar for convenience). You will see a black window with a command line where you can use the standard commands for your system (e.g. [Windows](https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/cmd)) plus the [commands for Conda](https://docs.conda.io/projects/conda/en/stable/commands/index.html) (and Mamba if you installed Miniforge).

To see all your Python installations, type ``where Python`` and hit enter. Now you are probably shown multiple paths where Python is installed and this can easily cause confusion about which Python is actually used to run code or to install extensions (packages) for. Conda is made exactly to avoid this confusion and to easily maintain control over your Python installations and packages.

*What are Python packages?* -
A Python package is a collection of modules, which, in turn, are essentially Python or C scripts that contain published functionality. There are Python packages for data input, data analysis, data visualization, etc. Each package offers a unique toolset and may have its own unique syntax. The [Python Standard Library](https://docs.python.org/3/library/) contains a general-purpose set of packages shipped with every Python installation. Many additional ('third-party' or 'external') packages have been published and can be installed as needed.

Package management is useful because you may want to update a package for one of your projects, but keep it at the same version in other projects to ensure that they continue to run as expected. With Conda, we can manage packages in different *environments*, as an installation happens only in the currently active environment (indicated in brackets at the beginning of the prompt and indicated by an asterisk (*) when you list all environemnts whith ``conda env list``). We can switch between environments with ``conda activate <your_env_name>`` Initially, there is only the 'base' environment but we can create others. When you install packages (or another Python version) for a specific project, it is recommended that you do this in a fresh environment (not in the base environment).

When one of your environments becomes “broken” or obsolete, you can simply delete it with ``conda remove -n <your_env_name> --all``. This will delete the corresponding folder and all packages in it (Yes, an environment is just a folder on your system!).

### Mamba

Solving the dependencies for complex environments with many different packages can take a long time (and sometimes fail) with conda. [Mamba](https://mamba.readthedocs.io/en/latest/index.html) was developed to improve this. Mamba is a drop-in replacement and uses the same commands and configuration options as Conda, i.e., you can swap almost all commands between Conda and Mamba (see below).

Miniforge has Mamba pre-installed in the base environment<!--(since release 23.3.1-0, thus being identical to Mambaforge now (except for installation paths))-->.

If you have a Conda installed with another distribution (e.g. Anaconda), you can

conda install mamba -n base -c conda-forge

and then use mamba to install other packages.

### Basic use

Let's try this! Create a new environment called 'geopython' which contains an installation of Python 3.10.

mamba create --name geopython python=3.10

Activate this 'geopython' environment.

mamba activate geopython

Check which packages and versions are installed in this environment.

mamba list

Now install some packages to this active environment. Note that we only specify some packages but many others (such as *pandas*, *numpy* and *scipy*) are also installed automatically because they are required for the specified ones to work properly.

mamba install ipykernel geopandas rasterio xarray rioxarray

Check which environments we have. The active one is marked by an asterisk (*).

mamba env list

Just for testing, we could create another environment called 'geopython_v2' which is identical to the 'geopython' environment.

mamba create -n geopython_v2 --clone geopython

Delete the 'geopython_v2' environment and all packages in it.

mamba remove -n geopython_v2 --all

See also this [Conda cheat sheet](https://docs.conda.io/projects/conda/en/stable/user-guide/cheatsheet.html) for a list of useful commands. Read more about Conda, packages and environments in the [conda documentation](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html#managing-environments).

Unfortunately, not all packages are available from a conda channel (such as conda-forge). In this case (and only in this case) you should install the package either from the Python Package Index ([PyPi](https://pypi.org/)) via [``pip``](https://pip.pypa.io/en/stable/) within a Conda environment (see also the recommendations [here](https://docs.conda.io/projects/conda/en/stable/user-guide/tasks/manage-environments.html#pip-in-env)). More on the differences between Conda and pip and why Conda is the recommended way is explained [here](https://www.anaconda.com/blog/understanding-conda-and-pip).

### Set up the E-TRAINEE course environment

To set up an environment with Python and the packages required for the course, there are two options:

**Option 1:** Use the YAML requirements file provided for the course. This file defines the environment with all required packages (versions fixed, builds not fixed for cross-platform compatibility). Just download the Module 1 [requirements file](https://3dgeo-heidelberg.github.io/etrainee/assets/python_envs/etrainee_m1.yml) from the E-TRAINEE GitHub, save it in your working directory and run this command:

```
conda env create -f etrainee_m1.yml --name etrainee_m1
```

You can also open the YAML file in a text editor and have a look at its content. [Here](https://3dgeo-heidelberg.github.io/etrainee/assets/python_envs/), you find also requirements files for the other E-TRAINEE modules and one for the entire course.

**Option 2:** Run the following commands (in the Conda/Miniforge prompt or in a VSCode terminal with Conda recognized):

```
mamba create -n etrainee_m1 python=3.10
mamba activate etrainee_m1
mamba install ipykernel earthengine-api eemont geemap pygis wxee scikit-learn stackstac xarrayutils hvplot datashader xmovie laspy vaex seaborn
```

*Note*: This will install >480 packages, with a total download volume of >780 MB.

If you need to install additional packages, you can usually do this with `mamba install <package_name>`. If a specific package is not available on the Conda channels (e.g., conda-forge), you might have to use `pip install <package_name>` or install from source.

### Conda and Python in VSCode

#### Conda environments in VSCode

You can also use Conda in a terminal (prompt) in Visual Studio Code (instead of the Miniforge prompt) to manage packages and environments.
And, importantly, you will be able to select one of your environments as a 'kernel' which is used to run Python code.

If Conda is not recognized in VSCode, open VSCode by entering ``code` in the Miniforge prompt. For other solutions see e.g. [this article](https://medium.com/analytics-vidhya/efficient-way-to-activate-conda-in-vscode-ef21c4c231f2).

#### Creating and running a Python script

In the upper left menu of VScode, go to *File - New File ...* to create a new Python file (*.py) and save it in your working directory ('my_script.py'). Alternatively, use the shortcut *CTRL + Shift + P" to show and run commands for VSCode at the top of your screen and then type/select "Python: New Python File". Write a very simple script that prints "Hello world!".

You have at least two options to control the environment/kernel used to run the script:

* In the VSCode terminal, activate the environment you want to use. Type the file name of the script into the terminal and hit enter (make sure you are in the same folder as this script or type also the path to the script).
* `CTRL + Shift + P` and type `Python: Select Interpreter`. Run the script via the play button in the upper right. This opens an interactive window, where you can also choose the `kernel' among your Python installations (for the next run).

Actually, the environment doesn't matter now for this simple script but as we are going to do more specific things we will need an environment with the right packages installed. More on Python and VSCode is explained [here](https://code.visualstudio.com/docs/languages/python).

### Next: Jupyter Notebooks

Continue with *Jupyter Notebooks* for interactive computing and workflow documentation [here](./jupyter.ipynb).
Loading

0 comments on commit 3fe0f00

Please sign in to comment.