See from the Sky: Examining Cloud Detection Algorithms on the Arctic MISR Data

This project is a case study of constructing cloud detection algorithms employed on satellite images.

Specifically, we are interested in predicting the cloud pixels among image data like the following.

We split the data by blocks, employ a number of classfication methods and assess their performance by loss functions. The primary work of model fitting and cross validation (CV) is done via the CVmaster function. For any future users who want to reproduce our analysis or train classification methods based on new image data, please refer to the following.

Data Splitting

Considering the non-independent nature of the image pixels, users should be careful when splitting the data for training and testing purposes. There are two non-trivial ways of data splitting that we recommend in this project.

A. Horizontal Cut The first method cuts each image horizontally in order to ensure that every resulting block has a reasonable portion of clouds and clear surfaces. Basically, each image is cut into five blocks by evenly dividing Y coordinates, and three of them would be used as training data, the rest two blocks are used as validation and testing respectively.

B. K-Means Clusters The second method of blocked data splitting is to use the K-means algorithm. By selecting a cluster size of five, we can divide each image’s datapoints into five distinct groups (according to X-Y coordinates). Again, three of these are used for training data, one is for validation and the last one is for testing.

There are, of course, other ways of splitting blocked data. We recommend future users to try to fit classification methods on different ways of data processing and verify that the CV results should be roughly similar.

Usage of `CVmaster.R`:

The data used as input of the algorithm should be image pixels. For each pixel, it contains the following information:

-- Coordinates of the pixel: Y-coord and X-coord;

-- Expert labels of cloud or non-cloud: Cloud = -1 or 1;

-- Potential covaraites include NDAI, logSD, CORR, DF, CF, BF, AF, and AN.

Users have the option to give a generic classification method. Currently our analysis includes the Logistic Regression, LDA, QDA, Naive Bayes and Boosting Trees.

Since it is cross validation, users could also choose K, the number of folds, and a loss function, which currently only has default as accuracy (1-misclassification error).

The CVmaster function takes the above input and would return the training accuracy at each fold as well as an overall CV average accuracy. The CV accuracy is thus a useful metric of evaluating the performance of the classification method on the training data.

If there are multiple tuning parameters in the model, for example the learning rate, number of weak learners and max depth of each tree in the boosting trees algorithm, our CVmaster function is able to fit a range of specified parameter values and yield the best combination of tuned parameters.

More Model Assessment:

In addition to the CV training accuracy, there are more metrics that can be used to assess models. In particularly, for models that yield predictions in the form of probabilities, users could plot ROC curves and find the best cut-off values for classifications.

The process of plotting ROC and AUC, as well as finding the best cut-off values, is presented in both code and write-up files.

The ROC curve of the model's prediction of test data and true test data's label can reveal how the model perform on the test data via the Area under the Curve (AUC). Based on our current data and methods, boosting trees usually yield the best results. We recommend users to carefully examine more than one assessments when fitting the classification methods on new image data.

In addition, ROC curves are useful in determining the cut-off values, particuarly for logistic regression and boosting trees. We find the best cuf-off threshold based on the Youden statistics.

Other model assessment metrics we use include precision and F1 scores, both of which support the claim that the boosting trees yield better performance in predicting cloud pixels.

Prediction

Following our analysis and model fitting in the codes above, users may use a well-trained classification model to predict cloud on image data. There are, of course, chances of missclassification in the prediction as shown below. Genearlly, we expect the boosting tree model to perform well on similar images with around 5% error rate, outperforming most of the other classification methods.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
PROJ2_Writeup_cache/latex		PROJ2_Writeup_cache/latex
PROJ2_Writeup_files		PROJ2_Writeup_files
PROJ2_code_cache/latex		PROJ2_code_cache/latex
PROJ2_code_files/figure-latex		PROJ2_code_files/figure-latex
image_data		image_data
.DS_Store		.DS_Store
.gitignore		.gitignore
Boosting_CV_KM.RData		Boosting_CV_KM.RData
Boosting_KM_CV_result.RData		Boosting_KM_CV_result.RData
Boosting_cv_Cut.RData		Boosting_cv_Cut.RData
CVmaster.R		CVmaster.R
Image_all_Cut.RData		Image_all_Cut.RData
Image_all_KM.RData		Image_all_KM.RData
PROJ2_Writeup.Rmd		PROJ2_Writeup.Rmd
PROJ2_Writeup.pdf		PROJ2_Writeup.pdf
PROJ2_code.Rmd		PROJ2_code.Rmd
PROJ2_code.pdf		PROJ2_code.pdf
README.md		README.md
ROC.R		ROC.R
STA521-Project-2.Rproj		STA521-Project-2.Rproj
TREE.png		TREE.png
all_result.csv		all_result.csv
cloud.bib		cloud.bib
final_boost_All.RData		final_boost_All.RData
final_boost_KM.RData		final_boost_KM.RData
project2.pdf		project2.pdf
test_accuracy_Cut.RData		test_accuracy_Cut.RData
test_accuracy_KM.RData		test_accuracy_KM.RData
test_accuracy_KM2.RData		test_accuracy_KM2.RData
train_accuracy_Cut.RData		train_accuracy_Cut.RData
train_accuracy_KM.RData		train_accuracy_KM.RData
train_accuracy_KM2.RData		train_accuracy_KM2.RData
yu2008.pdf		yu2008.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

See from the Sky: Examining Cloud Detection Algorithms on the Arctic MISR Data

Data Splitting

Usage of `CVmaster.R`:

More Model Assessment:

Prediction

About

Releases

Packages

Contributors 2

Languages

sheny2/Cloud_Detection

Folders and files

Latest commit

History

Repository files navigation

See from the Sky: Examining Cloud Detection Algorithms on the Arctic MISR Data

Data Splitting

Usage of CVmaster.R:

More Model Assessment:

Prediction

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Usage of `CVmaster.R`:

Packages