This is a small research project that I carried out in a few days during the VII Iberian Modelling Week, which was held from November 26 to December 1, 2021. This is an annual event for university students from Spain and Portugal with a background in mathematics. Students collaborate to solve an applied mathematics problem in a few days, so this is an opportunity to work on real problems that arise in an industrial context.
In my case, I collaborated with another student in the project Statistical Inference of Fish Populations from Deep-Learning Data. We had to come up with some statistical methods to analyze the reliability of a convolutional neural network designed to detect fish in underwater images. This work is based on a larger project called Deep-Ecomar (Underwater Cameras as Biological Sensors: Deep Learning in Marine Ecology) where hundreds of images of fish have been processed to train a Mask R-CNN. The network will give a number of detected fish (variable TD) as an output, but some predictions may not be correct.
During a weekend, we thought about how to model mathematically the CNN errors in order to estimate credibility intervals where the actual number of fish (variable A) should be. We proposed several models and ways to quantify the reliability of the CNN output. This includes an analysis of usual machine learning variables such as the number of true positives (TP), false positives (FP), false negatives (FN), precision (P) and recall (R). We also studied some methods of nonparametric inference, but the most fruitful part was the application of Bayesian inference after assigning reasonable probability distributions to the variables. We ran several simulations that shed some light on how the prior distribution of A transforms into the posterior distribution of A depending on the network performance and its prediction. This illustrated the usefulness of Bayesian analysis and Markov chain Monte Carlo algorithms.