This repository is a companion to the manuscript “Towards a responsible machine learning approach to identify forced labor at sea”, from Rocío Joo et al.
For reproducibility purposes, we created a
script
with all the reproducible analyses of the paper. Due to confidentiality
agreements, the negative cases used for validation cannot be shared, so
we modified the code to run it without them. Since there are differences
in how Mac and Linux/Windows handle random seeds when using the ranger
package. For that reason, we have copied our results in both Mac and
Linux for user comparison in the script. The data (with anonymized
vessels) to run the script is
here.
- These cannot be run without access to Global Fishing Watch tables in Big Query.
- First step: Run queries to match tables of vessel information and compute movement patterns.
- Second step: Process the data to be in the right format for the model.
- Third, fourth and fifth steps: run sensitivity analyses for the number of bags, the hyperparameter values of the random forests, and the number of initial random seeds.
- Sixth: With those optimal values, run the model, do predictions, compute performance and fairness
- Seventh: Run an additional analysis of the ports used by those predicted as positives.