This repository contain the code to preprocess acoustic data for the CRIMAC project

A docker image to pre-process a collection of SIMRAD's EK60/EK80 acoustic raw files into an xarray dataset using pyEcholab package. The dataset is then stored as zarr/netcdf files on disk.

In addition, pre-processing the Marec's LSSS work files into a pandas dataframe as a parquet file is now supported (see the disk mounting option below).

Features

Automatic range re-gridding (by default it uses the main channel’s range from the first raw file, see MAX_RANGE_SRC option below).
Sv processing and re-gridding the channels are done in parallel (using Dask’s delayed).
Automatic resuming from the last ping_time if the output file exists.
Batch processing is done by appending directly to the output file, should be memory efficient.
The image of this repository is available at Docker Hub (https://hub.docker.com/r/crimac/preprocessor).
Processing annotations from .work files into a pandas dataframe object (using: https://github.com/CRIMAC-WP4-Machine-learning/CRIMAC-annotationtools).

Options to run

Two directories need to be mounted:
1. /datain should be mounted to the data directory where the .raw files are located.
2. /dataout should be mounted to the directory where the output is written.
3. /workin should be mounted to the directory where the .work files are located (optional).
Choose the frequency of the main channel:
```
--env MAIN_FREQ=38000
```

Choose the range determination type:

# Set the maximum range as 500,
--env MAX_RANGE_SRC=500

# or use the the main channel's maximum range from all the files (for historical data),
--env MAX_RANGE_SRC=auto

# or use the the main channel's maximum range from the first processed file (for historical data)
--env MAX_RANGE_SRC=None

Select output type, zarr and NetCDF4 are supported:

--env OUTPUT_TYPE=zarr

--env OUTPUT_TYPE=netcdf4

Select file name output (optional, default to out.<zarr/nc>)
```
--env OUTPUT_NAME=S2020842
```
Set if we want a visual overview of the Sv data (in a PNG format image)
```
--env WRITE_PNG=1 # enable or 0 to disable
```
Optional attribute to process only one selected file when there are many raw files in the raw folder
```
--env RAW_FILE=2019847-D20190509-T014326.raw
```

Example

docker run -it \
-v /data/cruise_data/2020/S2020842_PHELMERHANSSEN_1173/ACOUSTIC/EK60/EK60_RAWDATA:/datain \
-v /data/cruise_data/2020/S2020842_PHELMERHANSSEN_1173/ACOUSTIC/LSSS/WORK:/workin \
-v /localscratch/ibrahim-echo/out:/dataout \
--security-opt label=disable \
--env OUTPUT_TYPE=zarr \
--env MAIN_FREQ=38000 \
--env MAX_RANGE_SRC=500 \
--env OUTPUT_NAME=S2020842 \
--env WRITE_PNG=0 \
crimac/preprocessor

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
.github/workflows		.github/workflows
preprocessing		preprocessing
test		test
.gitignore		.gitignore
COPYING.LESSER		COPYING.LESSER
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This repository contain the code to preprocess acoustic data for the CRIMAC project

Features

Options to run

Example

About

Releases

Packages

Languages

License

bberges/CRIMAC-preprocessing

Folders and files

Latest commit

History

Repository files navigation

This repository contain the code to preprocess acoustic data for the CRIMAC project

Features

Options to run

Example

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages