Merge branch 'main' of github.com:DanielaCabiddu/PHREESQL

DanielaCabiddu · Feb 7, 2024 · 01e2319 · 01e2319
2 parents 6d4dfd9 + 3d7dbc9
commit 01e2319
Show file tree

Hide file tree

Showing 11 changed files with 343 additions and 218 deletions.
diff --git a/.gitmodules b/.gitmodules
@@ -7,3 +7,6 @@
 [submodule "external/doxygen-awesome"]
 	path = external/doxygen-awesome
 	url = https://github.com/jothepro/doxygen-awesome-css.git
+[submodule "external/tclap"]
+	path = external/tclap
+	url = https://github.com/mirror/tclap.git
diff --git a/example/README.md b/example/README.md
@@ -1,64 +1,41 @@
-# Examples
+# Case Study
 
-The application below has been designed to show how PHREESQL can be employed. Through some bash scripts the complete pipeline for pre-processing input files, computing solution speciation, creating various SQL databases and extracting data is demonstrated. The examples are based on a large dataset of publicly accessible geochemical data, covering the generation and post-processing of PHREEQC output data.
+The application below has been designed to show how PHREESQL can be employed. Through some bash scripts, the complete pipeline for pre-processing input files, computing solution speciation, creating various SQL databases, and extracting data is demonstrated. The examples are based on a large dataset of publicly accessible geochemical data, covering the generation and post-processing of PHREEQC output data. Specifically, the [Ireland EPA database](https://gis.epa.ie/GetData/Download, filename: EPA Groundwater Monitoring Data to End 2020 Circulation 26.05.21.xlsx) is used as an input dataset.The dataset contains a total of 295 sampling stations in Ireland’s aquifers, which have been sampled for a total of 14690 analyses over a 30-year period until 2020. A total of 303 variables are present in the original dataset, which provides descriptive information, spatial coordinates, dissolved components or species, and chlorinated organic compounds. The file utilized as an input in our experimental pipeline is a modified iteration of the original Ireland EPA dataset. This revised version consists of 45 columns and excludes the notably sparse organic compound records. Both the original and the modified versions of the input dataset can be found in the **EPA_project** folder, assumed to be the designated project for this case study.
 
-The outlined steps focus on data extraction, data plotting, and integration with other software, including coordinate conversion for variographic study and mapping. The entire procedure, implemented in scripts, is available in the distribution and can be evaluated at each step.
+As a matter of example and no loss of generality, we demonstrate the usage of PHREESQL by running PHREESQLexe, which is assumed to be located in the **${ROOT}/build** folder in this repository. The outlined steps focus on data extraction, data plotting, and integration with other software, including coordinate conversion for variographic study and mapping of a geochemical variable as output of PHREEQC computations. The entire procedure, implemented in scripts, is available in the distribution and can be evaluated at each step.
 
-As a matter of example and no loss of generality, we demonstrate the usage of PHREESQL by running PHREESQLexe, which is assumed to be located in the **${ROOT}/build** folder in this repository.
+## PHREESQL database creation
 
-## How to create PHREESQL databases
-
-The script *00_step-by-step.sh* consolidates all the PHREESQL capabilities into one, also assisting the reader in PHREEQC input generation starting from the .csv file. It enables the possibility to run the full pipeline on a user-defined project, provided as an input parameter, by sequentially calling the following bash scripts:
+The script *00_step-by-step.sh* consolidates all the PHREESQL capabilities into one, also assisting the reader in PHREEQC input generation starting from the *.csv* file. It sequentially calls the following scripts
 
 - [shell_script/00_clean_all.sh](https://github.com/DanielaCabiddu/PHREESQL/blob/main/example/shell_script/00_clean_all.sh)
 - [shell_script/00_prepare_folders.sh](https://github.com/DanielaCabiddu/PHREESQL/blob/main/example/shell_script/00_prepare_folders.sh)
 - [shell_script/000_gigantic_run.sh](https://github.com/DanielaCabiddu/PHREESQL/blob/main/example/shell_script/000_gigantic_run.sh)
 - [shell_script/00_create_all_output.sh](https://github.com/DanielaCabiddu/PHREESQL/blob/main/example/shell_script/00_create_all_output.sh)
 - [shell_script/00_fill_all_db.sh](https://github.com/DanielaCabiddu/PHREESQL/blob/main/example/shell_script/00_fill_all_db.sh)
-- [shell_script/00_compute_and_fill.sh](https://github.com/DanielaCabiddu/PHREESQL/blob/main/example/shell_script/00_compute_and_fill.sh)
-
-Details on each single script are provided in the sections below.
-
 
+Further information on each individual script is elaborated in the sections below.
 
 ### shell_script/00_clean_all.sh
+This script is responsible for the cleanup of files within the **EPA_project/DB**, **EPA_project/scratch** and **EPA_project/run_DB** folders.
 
-/* Script cleaning all files in DB, scratch and run_DB*/
-
-[no argument]
-
-### shell_script/00_prepare_folders.sh 
-
-/* Script preparing run_DB folder structure to accept .pqi, .pqo and .met files*/
-
-[no argument] 
-
-### shell_script/000_gigantic_run.sh 
-
-/* Script to generate from a .csv dataset a SHORT|MEDIUM|FULL database based on short dataset with a minted PHREEQC thermodynamic database (with extension) and naming all input and output files with EPA_ prefix*/
+### shell_script/00_prepare_folders.sh
+This script is designed to prepare the folder structure within **EPA_project/run_DB** to accommodate *.pqi*, *.pqo*, and *.met* files.
 
-EPA_GW_IE_short.csv SHORT|MEDIUM|FULL EPA_ minteq.dat|phreeqc.dat|sit.dat|wateq4f.dat|llnl.dat  
+### shell_script/000_gigantic_run.sh
+This script is responsible for preparing data subsets and organizing PHREEQC thermodynamic databases. Derived from the modified Ireland EPA dataset, this script creates three distinct data subsets (*SHORT*, *MEDIUM* and *FULL*) and organizes the data for speciation simulations using five PHREEQC thermodynamic databases (*llnl.dat*, *wateq4f.dat*, *phreeqc.dat*, *minteq.dat* and *sit.dat*). For each combination of data subset *D* and PHREEQC thermodynamic database *P*, the script generates PHREEQC input files *(.pqi)* and corresponding metadata files *(.met)*. These files are then stored in the following directories: **EPA_project/run_DB/${D}/IN/${P}** and **EPA_project/run_DB/${D}/MET/${P}**.
 
-### shell_script/00_create_all_output.sh 
+### shell_script/00_create_all_output.sh
+This script calculates parallel speciation simulations for each combination of data subset *D* and PHREEQC thermodynamic database *P*. This is achieved by sequentially invoking PHREESQLexe with the following command line:
 
-/* Script to compute parallel speciation simulations */
+`${ROOT}/build/phreesqlexe --run_phreeqc --phreeqc_db ${P} --in_folder EPA_project/run_DB/${D}/IN/${P} --out_folder EPA_project/run_DB/${D}/OUT/${P}`
 
-SHORT|MEDIUM|FULL phreeqc|llnl|minteq|wateq4f|sit|ALL
+### shell_script/00_fill_all_db.sh
+This scripts creates and fills PHREESQL databases, one for each combination of data subset *D* and PHREEQC thermodynamic database *P*. This is achieved by sequentially invoking PHREESQLexe with the following command line:
 
-### shell_script/00_fill_all_db.sh 
+`${ROOT}/build/phreesqlexe --fill_db -d EPA_project/DB/${D}_${P}.db -i EPA_project/run_DB/${D}/IN/${P} -o EPA_project/run_DB/${D}/OUT/${P} -m EPA_project/run_DB/${D}/MET/${P}`
 
-/* Script to fill database having already .pqi, .met and .pqo files from previous script */
-
-SHORT|MEDIUM|FULL phreeqc|llnl|minteq|wateq4f|sit|ALL
-
-### shell_script/00_compute_and_fill.sh 
-
-/* Script to generate all database, from all datasets */
-
-[no argument]
-
-
-## How to query PHREESQL databases
+## PHREESQL database query
 
 - [shell_script/01_LL.sh](https://github.com/DanielaCabiddu/PHREESQL/blob/main/example/shell_script/01_LL.sh)
 - [shell_script/02_SI.sh](https://github.com/DanielaCabiddu/PHREESQL/blob/main/example/shell_script/02_SI.sh)
@@ -67,15 +44,14 @@ SHORT|MEDIUM|FULL phreeqc|llnl|minteq|wateq4f|sit|ALL
 - [shell_script/05_cs2cs.sh](https://github.com/DanielaCabiddu/PHREESQL/blob/main/example/shell_script/05_cs2cs.sh)
 
 ### shell_script/01_LL.sh
-The LL plot (Langelier and Ludwig, 1942) is a graphical tool used to classify waters based on the chemistry of major cations and anions in percent of meq/l. Categorizing groundwater based on various criteria and applying filters can help users analyze and interpret data more effectively. Criteria can include rock type, county name, aquifer name, or time interval, and these categories can be plotted separately for better visualization and analysis. Filters can involve selecting specific time ranges or applying conditions related to electrical balance or concentration ranges to refine the dataset and focus on specific aspects of groundwater data.
-
-The SQL script *LL.sql*, called by the *LL.sh* bash script, is used to extract groundwater samples from the *FULL_wateq4f.db* SQL database. The filter applied in this script is based on the county’s name, allowing users to retrieve data from specific counties without applying additional compositional or temporal filters. The output consists of three ASCII files, one for the whole dataset and two for a pair of Irish counties (Kilkenny and355 Donegal, arbitrarily chosen).
+This script demonstrates how to query a PHREESQL database generated by our case study to extract groundwater samples. Specifically, the SQL script *LL.sql*, called by the *LL.sh* bash script, is used to extract groundwater samples from the *FULL_wateq4f.db* PHREESQL database, located in the **EPA_project/DB** folder. The filter applied in this script is based on the county’s name, allowing users to retrieve data from specific counties without applying additional compositional or temporal filters. The output consists of three ASCII files, one for the whole dataset () and two for a pair of Irish counties (Kilkenny and Donegal, arbitrarily chosen). By customizing the query, looping over all the 26 counties, the LL diagram can present an atomic information about each County.
 
 ### shell_script/02_SI.sh
 
 sql_scripts/SI.sql
 
 ### shell_script/03_ehph.sh
+This script is designed to illustrate the process of calling an external package, specifically PhreePlot, to create a graphical output of PHREESQL database queries. The SQL script *03_eh_ph.sh*, called by the *03_ehph.sh* bash script, extracts the *pe - pH* couples from the *FULL_wateq4f.db* PHREESQL database, located in the **EPA_project/DB** folder for both Kilkenny and Donegal counties, as previously done, with a single constraint on charge balance within a range of +/-10%.
 
 sql_scripts/ehph.sql
 
@@ -86,4 +62,3 @@ sql_scripts/VARIO.sql
 ### shell_script/05_cs2cs.sh
 
 sql_scripts/cs2cs.sql
-
diff --git a/external/tclap b/external/tclap
diff --git a/phreesqlexe/CMakeLists.txt b/phreesqlexe/CMakeLists.txt
@@ -7,7 +7,7 @@ set(CMAKE_CXX_STANDARD_REQUIRED ON)
 
 set(CMAKE_CXX_FLAGS "-std=c++17 -fpermissive")
 
-option(USE_OPENMP "Use OpenMP" OFF)
+option(USE_OPENMP "Use OpenMP" ON)
 #option(BUILD_DOC "Build documentation" OFF)
 
 if (USE_OPENMP)
@@ -166,11 +166,13 @@ include_directories(BEFORE ${PHREEQC_SRC_FOLDER}/phreeqcpp/PhreeqcKeywords)
 include_directories(BEFORE ${PROJ_INCLUDE_FOLDER})
 include_directories(BEFORE ${PROJ_SRC_FOLDER})
 
+include_directories (BEFORE ${EXTERNAL_FOLDER}/tclap/include)
+
 
 if (MSVC)
     include_directories(BEFORE ${SQLite3_INCLUDE_DIR})
     include_directories(BEFORE ${EXTERNAL_FOLDER}/dirent_win)
-    include_directories(BEFORE ${EXTERNAL_FOLDER}/getopt_win)
+    #include_directories(BEFORE ${EXTERNAL_FOLDER}/getopt_win)
 endif()
 
 if (BUILD_DOC)