The Scientific Filesystem is used as to provide the entry points for the different tasks available (known as "apps" with the Scientific Filesystem). These apps are used to create workflows.
- Terminology used
- Running the apps
- Build The Container
- A Note On Docker Sibling Containers
- Acceptance Testing
Here are the definition of some of the terms we use with links to additional information
-
apps This term refers to the entry points in a (Scientific Filesystem)[#def_scif] solution.
-
BETYdb BETYdb is a database that can be used to store trait and yield data. It can be used in the processing pipeline as a source of plot geometry for clipping.
-
GeoJSON GeoJSON is a JSON format for specifying geographic shape information. This is the default format for specifying plot geometries.
-
Makeflow We use Makeflow to run the apps defined with Scientific Filesystem. This tool enables error recovery, automatic retries, distributed computing, and many other features.
-
Scientific Filesystem We use the Scientific Filesystem to organize our applications, provide ease of execution, and to assist in reproducibility.
This section contains information on running the different apps in existing Docker workflow container. By tying these different applications together, flexible workflows can be created and distributed.
To determine what apps are available, try the following command:
docker run --rm agdrone/canopycover-workflow:latest apps
The different components of the command line are:
docker run --rm
tells Docker to run an image and remove the resulting container automatically after the runagdrone/canopycover-workflow:latest
is the Docker image to runapps
the command that lists the available apps
- Docker needs to be installed to run the apps. How to get Docker
- Create an
inputs
folder in the current working directory (or other folder of your choice) to hold input files
mkdir -p "${PWD}/inputs"
- Create an
outputs
folder in the current working directory (or other folder of your choice) to hold the results
mkdir -p "${PWD}/outputs"
- Create an
checkpoints
folder. The checkpoints folder will contain the generated workflow checkpoint data allowing easy error recovery and helps prevent re-running an already completed workflow. Removing the workflow checkpoint files will enable a complete re-run of the workflow
mkdir -p "${PWD}/checkpoints"
Most of the apps described in this document need additional information to perform; such as the source image name. This information is provided through a JSON file that is made available to a running container.
Each of the apps described provide the keys they expect to find, along with a description of the associated value.
We recommend naming the configuration JSON files something that is related to the intent; such as the workflow that they are a part of.
Plot geometries are needed when clipping source files to where they intersect the plots. The plot geometries need to be in GeoJSON format. Apps are provided to convert shapefiles and BETYdb URLs to the GeoJSON format.
This app retrieves the plots from a BETYdb instance and saves them to a file in the GeoJSON format.
JSON configuration
There are two JSON key/value pairs needed by this app.
- BETYDB_URL: the URL of the BETYdb instance to query for plot geometries
- PLOT_GEOMETRY_FILE: the path to write the plot geometry file to, including the file name
For example:
{
"BETYDB_URL": "https://terraref.ncsa.illinois.edu/bety",
"PLOT_GEOMETRY_FILE": "/output/plots.geojson"
}
Sample command line \
docker run --rm -v ${PWD}/outputs:/output -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json agdrone/canopycover-workflow:latest run betydb2geojson
The different components of the command line are:
docker run --rm
tells Docker to run an image and remove the resulting container automatically after the run (--rm
)-v ${PWD}/outputs:/output
mounts the previously created outputs folder to the/output
location on the running image-v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json
mounts the JSON configuration file so that it's available to the appagdrone/canopycover-workflow:latest
is the Docker image to runrun betydb2geojson
the command that runs the app
Please notice that the /output
folder on the command line corresponds with the PLOT_GEOMETRY_FILE
starting path value in the configuration JSON
This app loads plot geometries from a shapefile and saves them to a file in the GeoJSON format.
JSON configuration
There are two JSON key/value pairs needed by this app.
- PLOT_SHAPEFILE: the path to the shapefile to load and save as GeoJSON
- PLOT_GEOMETRY_FILE: the path to write the plot geometry file to, including the file name
For example:
{
"PLOT_SHAPEFILE": "/input/plot_shapes.shp",
"PLOT_GEOMETRY_FILE": "/output/plots.geojson"
}
Sample command line \
docker run --rm -v ${PWD}/inputs:/input -v ${PWD}/outputs:/output -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json agdrone/canopycover-workflow:latest run shp2geojson
The different components of the command line are:
docker run --rm
tells Docker to run an image and remove the resulting container automatically after the run (--rm
)-v ${PWD}/inputs:/input
mounts the previously created inputs folder to the/input
location on the running image-v ${PWD}/outputs:/output
mounts the previously created outputs folder to the/output
location on the running image-v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json
mounts the JSON configuration file so that it's available to the appagdrone/canopycover-workflow:latest
is the Docker image to runrun shp2geojson
the command that runs the app
Please notice the following:
- the
/input
folder on the command line corresponds with thePLOT_SHAPEFILE
starting path value in the configuration JSON; this is where the app expects to find the shapefile to load and convert - the
/output
folder on the command line corresponds with thePLOT_GEOMETRY_FILE
starting path value in the configuration JSON
This app masks out soil from an image.
JSON configuration
There are JSON key/value pairs for this app
- SOILMASK_SOURCE_FILE: the path to the image to mask the soil from
- SOILMASK_MASK_FILE: the name of the mask file to write. Will be written to the path defined in SOILMASK_WORKING_FOLDER if a path is not specified
- SOILMASK_WORKING_FOLDER: the path to where the results of processing should be placed
- SOILMASK_OPTIONS: any options to be passed to the script
The following JSON example would have the soilmask app write the mask to a file named orthomosaic_masked.tif
in the /output/
folder of the running Docker image:
{
"SOILMASK_SOURCE_FILE": "/input/orthomosaic.tif",
"SOILMASK_MASK_FILE": "orthomosaic_masked.tif",
"SOILMASK_WORKING_FOLDER": "/output",
"SOILMASK_OPTIONS": ""
}
The following options are available to be specified on the SOILMASK_OPTIONS JSON entry:
--metadata METADATA
this option indicates a metadata YAML or JSON file to use when processing--help
displays the soilmask help information without any file processing
Sample command line \
docker run --rm -v ${PWD}/test/inputs:/input -v ${PWD}/test/output:/output -v ${PWD}/chris-jx-args.json:/scif/apps/src/jx-args.json agdrone/canopycover-workflow:latest run soilmask
The different components of the command line are:
docker run --rm
tells Docker to run an image and remove the resulting container automatically after the run (--rm
)-v ${PWD}/inputs:/input
mounts the previously created inputs folder to the/input
location on the running image-v ${PWD}/outputs:/output
mounts the previously created outputs folder to the/output
location on the running image-v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json
mounts the JSON configuration file so that it's available to the appagdrone/canopycover-workflow:latest
is the Docker image to runrun soilmask
the command that runs the app
Please notice the following:
- the
/input
folder on the command line corresponds with theSOILMASK_SOURCE_FILE
path value in the configuration JSON; this is where the app expects to find the source image - the
/output
folder on the command line corresponds with theSOILMASK_WORKING_FOLDER
path value in the configuration JSON; this is where the masked image is stored
This app clips georeferenced images to plot boundaries.
JSON configuration
There are JSON key/value pairs for this app
- PLOTCLIP_SOURCE_FILE: the path to the image to clip
- PLOTCLIP_PLOTGEOMETRY_FILE: the path to the GeoJSON file containing the plot boundaries; see also BETYdb to GeoJsonand Shapefile to GeoJson
- PLOTCLIP_WORKING_FOLDER: the path to where the results of processing should be placed; each plot clip is placed in a folder corresponding to the plot name
- PLOTCLIP_OPTIONS: any options to be passed to the script
The following JSON example would have the plot clips written to the /output/
folder of the running Docker image:
{
"PLOTCLIP_SOURCE_FILE": "/input/orthomosaic_mask.tif",
"PLOTCLIP_PLOTGEOMETRY_FILE": "/input/plots.geojson",
"PLOTCLIP_WORKING_FOLDER": "/output",
"PLOTCLIP_OPTIONS": ""
}
The following options are available to be specified on the PLOTCLIP_OPTIONS JSON entry:
--metadata METADATA
this option indicates a metadata YAML or JSON file to use when processing--keep_empty_folders
specifying this option will create a folder with the plot name even if the plot doesn't intersect the image--plot_column PLOT_COLUMN
specifies the column name ("properties" sub-key with GeoJSON) to use as the plot name--help
displays the plotclip help information without any file processing
Sample command line \
docker run --rm -v ${PWD}/test/inputs:/input -v ${PWD}/test/output:/output -v ${PWD}/chris-jx-args.json:/scif/apps/src/jx-args.json agdrone/canopycover-workflow:latest run plotclip
The different components of the command line are:
docker run --rm
tells Docker to run an image and remove the resulting container automatically after the run (--rm
)-v ${PWD}/inputs:/input
mounts the previously created inputs folder to the/input
location on the running image-v ${PWD}/outputs:/output
mounts the previously created outputs folder to the/output
location on the running image-v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json
mounts the JSON configuration file so that it's available to the appagdrone/canopycover-workflow:latest
is the Docker image to runrun plotclip
the command that runs the app
Please notice the following:
- the
/input
folder on the command line corresponds with thePLOTCLIP_SOURCE_FILE
path value in the configuration JSON; this is where the app expects to find the source image - the
/output
folder on the command line corresponds with thePLOTCLIP_WORKING_FOLDER
path value in the configuration JSON; this is where the plot image clips are saved
This app locates files with a specific name and writes a JSON file that can then be used to process those files. Makeflow is a deterministic scheduler, meaning that when it's run it needs to "know" everything about a job; such as which files are input. Apps like Plotclip are non-deterministic in that there isn't a way ahead of time of knowing which plots intersect an image (unless complete plot coverage is guaranteed, which doesn't always happen). Even in cases where the output of a step is deterministic, it may still be handy to use this app to build up a JSON file.
The source top-level folder is shallowly searched, only immediate sub-folders are searched, and the top folder is ignored.
JSON configuration
There are JSON key/value pairs for this app
- FILES2JSON_SEARCH_NAME: the complete name of the file to find
- FILES2JSON_SEARCH_FOLDER: the starting path to begin searching in
- FILES2JSON_JSON_FILE: the path to the found file's JSON is written to
The following JSON example would have the JSON file written to the /output/files.json
file of the running Docker image:
{
"FILES2JSON_SEARCH_NAME": "orthomosaic_mask.tif",
"FILES2JSON_SEARCH_FOLDER": "/input",
"FILES2JSON_JSON_FILE": "/output/files.json"
}
Sample command line \
docker run --rm -v ${PWD}/inputs:/input -v ${PWD}/output:/output -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json agdrone/canopycover-workflow:latest run find_files2json
The different components of the command line are:
docker run --rm
tells Docker to run an image and remove the resulting container automatically after the run (--rm
)-v ${PWD}/inputs:/input
mounts the previously created inputs folder to the/input
location on the running image-v ${PWD}/outputs:/output
mounts the previously created outputs folder to the/output
location on the running image-v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json
mounts the JSON configuration file so that it's available to the appagdrone/canopycover-workflow:latest
is the Docker image to runrun find_files2json
the command that runs the app
Please notice the following:
- the
/input
folder on the command line corresponds with theFILES2JSON_SEARCH_FOLDER
path value in the configuration JSON; this is where the app will start its search - the
/output
folder on the command line is included as part of theFILES2JSON_JSON_FILE
path value in the configuration JSON; this is the folder where the found file's JSON are saved
This app calculates the canopy cover of soilmasked images and writes the CSV files next to the source image (in the same folder).
JSON configuration
There are JSON key/value pairs for this app
- CANOPYCOVER_OPTIONS: any options to be passed to the script
The following JSON example shows how to define runtime options when running this app:
{
"CANOPYCOVER_OPTIONS": ""
}
The following options are available to be specified on the CANOPYCOVER_OPTIONS JSON entry:
--metadata METADATA
this option indicates a metadata YAML or JSON file to use when processing--help
displays the canopy cover help information without any file processing. This is useful for finding options which affect the output
Sample command line \
docker run --rm -v ${PWD}/test/inputs:/input -v ${PWD}/test/output:/output -v ${PWD}/chris-jx-args.json:/scif/apps/src/jx-args.json -v ${PWD}/canopy_cover_files.json:/scif/apps/src/canopy_cover_files.json agdrone/canopycover-workflow:latest run canopycover
The different components of the command line are:
docker run --rm
tells Docker to run an image and remove the resulting container automatically after the run (--rm
)-v ${PWD}/inputs:/input
mounts the previously created inputs folder to the/input
location on the running image-v ${PWD}/outputs:/output
mounts the previously created outputs folder to the/output
location on the running image-v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json
mounts the JSON configuration file so that it's available to the app-v ${PWD}/canopy_cover_files.json:/scif/apps/src/canopy_cover_files.json
mounts the JSON file containing information on the files to process so that it's available to the app; also see Find files and write JSONagdrone/canopycover-workflow:latest
is the Docker image to runrun canopycover
the command that runs the app
Please notice the following:
- the
/input
folder on the command line corresponds to where the files to be processed are expected to be found and where the CSV files are written to
This app calculates several greenness indices of soilmasked images and writes the CSV files next to the source image (in the same folder).
JSON configuration
There are JSON key/value pairs for this app
- CANOPYCOVER_OPTIONS: any options to be passed to the script
The following JSON example shows how to define runtime options when running this app:
{
"GREENNESS_INDICES_OPTIONS": ""
}
The following options are available to be specified on the GREENNESS_INDICES_OPTIONS JSON entry:
--metadata METADATA
this option indicates a metadata YAML or JSON file to use when processing--help
displays the greenness indices help information without any file processing. This is useful for finding options which affect the output
Sample command line \
docker run --rm -v ${PWD}/test/inputs:/input -v ${PWD}/test/output:/output -v ${PWD}/chris-jx-args.json:/scif/apps/src/jx-args.json -v ${PWD}/greenness_indices_files.json:/scif/apps/src/greenness-indices_files.json agdrone/canopycover-workflow:latest run greenness-indices
The different components of the command line are:
docker run --rm
tells Docker to run an image and remove the resulting container automatically after the run (--rm
)-v ${PWD}/inputs:/input
mounts the previously created inputs folder to the/input
location on the running image-v ${PWD}/outputs:/output
mounts the previously created outputs folder to the/output
location on the running image-v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json
mounts the JSON configuration file so that it's available to the app-v ${PWD}/greenness_indices_files.json:/scif/apps/src/greenness-indices_files.json
mounts the JSON file containing information on the files to process so that it's available to the app; also see Find files and write JSONagdrone/canopycover-workflow:latest
is the Docker image to runrun greenness-indices
the command that runs the app
Please notice the following:
- the
/input
folder on the command line corresponds to where the files to be processed are expected to be found and where the CSV files are written to
This app recursively merges same-named CSV files to a destination folder. If the folder contains multiple, differently named, CSV files, there will be one resulting merged CSV file for each unique CSV file name. All the source CSV files are left intact.
JSON configuration
There are JSON key/value pairs for this app
- MERGECSV_SOURCE: the path to the top-level folder containing CSV files to merge
- MERGECSV_TARGET: the path where the merged CSV file is written
- MERGECSV_OPTIONS: any options to be passed to the script
For example:
{
"MERGECSV_SOURCE": "/input",
"MERGECSV_TARGET": "/output",
"MERGECSV_OPTIONS": ""
}
The following options are available to be specified on the MERGECSV_OPTIONS JSON entry:
--no_header
this option indicates that the source CSV files do not have header lines--header_count <value>
indicates the number of header lines to expect in the CSV files; defaults to 1 header line--filter <file name filter>
one or more comma-separated filters of files to process; files not matching a filter aren't processed--ignore <file name filter>
one or more comma-separated filters of files to skip; files matching a filter are ignored--help
displays the help information without any file processing
By combining filtering options and header options, it's possible to precisely target the CSV files to process.
The filters work by matching up the file name found on disk with the names specified with the filter to determine if a file should be processed. Only the body and extension of a file name is compared, the path to the file is ignored when filtering.
Sample command line \
docker run --rm -v ${PWD}/inputs:/input -v ${PWD}/outputs:/output -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json agdrone/canopycover-workflow:latest run merge_csv
The different components of the command line are:
docker run --rm
tells Docker to run an image and remove the resulting container automatically after the run (--rm
)-v ${PWD}/inputs:/input
mounts the previously created inputs folder to the/input
location on the running image-v ${PWD}/outputs:/output
mounts the previously created outputs folder to the/output
location on the running image-v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json
mounts the JSON configuration file so that it's available to the appagdrone/canopycover-workflow:latest
is the Docker image to runrun merge_csv
the command that runs the app
Please notice the following:
- the
/input
folder on the command line corresponds with theMERGECSV_SOURCE
path value in the configuration JSON; this is where the app expects to find the CSV files to merge - the
/output
folder on the command line corresponds with theMERGECSV_TARGET
path value in the configuration JSON; this is where the merged CSV files are stored
Cleaning up a workflow run will delete workflow generated files and folders. Be sure to copy the data you want to a safe place before cleaning.
By adding the --clean
flag to the end of the command line used to execute the workflow, the artifacts of a previous run will be cleaned up.
It's recommended, but not necessary, to run the clean app between processing runs by either running this command or through other means.
Example:
The following docker command line will clean up the files generated using the Canopy Cover: Orthomosaic and Shapefile example above.
docker run --rm -v ${PWD}/outputs:/output -v ${PWD}/my-jx-args.json:/scif/apps/src/jx-args.json agdrone/canopycover-workflow:latest run betydb2geojson --clean
Notice the additional parameter at the end of the command line (--clean).
This section describes how the Docker container could be built. Please refer to the Docker documentation for more information on building Docker containers.
cp jx-args.json.example jx-args.json
docker build -t agdrone/canopycover-workflow:latest .
To monitor the running workflows, you will need to be using the checkpoints folder as described in the Prerequisites section.
Makeflow has monitoring tools available that can be used to follow the progress of the workflows. The makeflow_monitor tool can be a good starting point.
The OpenDroneMap workflow uses sibling containers. This is a technique for having one Docker container start another Docker container to perform some work. We plan to find a secure alternative for future releases (see AgPipeline/issues-and-projects#240), primarily because of a potential security risk that makes this approach not suitable for shared cluster computing environments (it is also a concern for containers such as websites and databases that are exposed to the internet, but that is not the case here). You can just as safely run these workflows on your own computer as you can any trusted Docker container. However, with sibling containers the second container requires administrator ("root") privileges - please see Docker documentation for more details.
There are automated test suites that are run via GitHub Actions. In this section we provide details on these tests so that they can be run locally as well.
These tests are run when a Pull Request or push occurs on the develop
or main
branches.
There may be other instances when these tests are automatically run, but these are considered the mandatory events and branches.
These tests are run against any Python scripts that are in the repository.
PyLint is used to both check that Python code conforms to the recommended coding style, and checks for syntax errors.
The default behavior of PyLint is modified by the pylint.rc
file in the Organization-info repository.
Please also refer to our Coding Standards for information on how we use pylint.
The following command can be used to fetch the pylint.rc
file:
wget https://raw.githubusercontent.com/AgPipeline/Organization-info/main/pylint.rc
Assuming the pylint.rc
file is in the current folder, the following command can be used against the betydb2geojson.py
file:
# Assumes Python3.7+ is default Python version
python -m pylint --rcfile ./pylint.rc betydb2geojson.py
PyTest is used to run Unit and Integration Testing. The following command can be used to run the test suite:
# Assumes Python3.7+ is default Python version
python -m pytest -rpP
If pytest-cov is installed, it can be used to generate a code coverage report as part of running PyTest. The code coverage report shows how much of the code has been tested; it doesn't indicate how well that code has been tested. The modified PyTest command line including coverage is:
# Assumes Python3.7+ is default Python version
python -m pytest --cov=. -rpP
These tests are run against shell scripts within the repository. It's expected that shell scripts will conform to these tools (no reported issues).
shellcheck is used to enforce modern script coding.
The following command runs shellcheck
against the "prep-canopy-cover.sh" bash shell script:
shellcheck prep-canopy-cover.sh
shfmt is used to ensure scripts conform to Google's shell script style guide.
The following command runs shfmt
against the "prep-canopy-cover.sh" bash shell script:
shfmt -i 2 -ci -w prep-canopy-cover.sh
The Docker testing Workflow replicate the examples in this document to ensure they continue to work.