mapping.Rmd

---
title: "Crop mapping"
output: 
  html_document:
    toc: true
    toc_float: true
    toc_collapsed: true
    toc_depth: 3
    number_sections: false
editor_options: 
  chunk_output_type: console
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Background

Information on spatial distribution of crops is an important step towards yield estimation. We need to know where the crops are before we estimate the yield in a given region. Ground mapping approaches like surveying are expensive and time intensive. Remote sensing offers an effective and efficient platform for mapping thanks to improved temporal and spatial resolutions. In this case we supplement optical data with UAV images for training sites collection, and image fusion for crop mapping.  

For crop mapping we use two different classification algorithms:

1. Random Forests (RF) by [Breiman 2001](https://link.springer.com/article/10.1023/A:1010933404324).
2. Maximum Likelihood Classification (MLC).

## Data preparation

Load libraries, declare variables and data paths.

```{r d1, message=FALSE}
rm(list=ls(all=TRUE))    #Clears R memory
unlink(".RData") 
if (!require("pacman")) install.packages("pacman"); library(pacman)
p_load(raster, terra, randomForest, RStoolbox, tictoc)

options(warn=1)
cat("Set variables and start processing\n")
Root 		<- 'D:/JKUAT/RESEARCH_Projects/Eswatini/Data/'
Path_out    <- paste0(Root,"Output/")

```

Load Mpolonjeni UAV images for mission 1--5 and Sentinel 2. We will mosaic the UAV images to one image at 1 m spatial resolution and save to disk later. Once this is done we skip this process every other time we run the script.

```{r d2}
#Sentinel 2
path <- list.files(paste0(Root,'S2/interim/'),pattern = (".tif$"), recursive = TRUE, full.names = TRUE)
s <- rast(path)
names(s) <- c("b", "g","r", "nir")
s
#UAV
mosaicname <- paste0(Path_out,'Mpolonjeni_W1_Mosaic.tif')
if(!file.exists(mosaicname)){
folders <- list.dirs(paste0(Root,'WingtraOne/Mpolonjeni'),recursive=TRUE)[-1]
folders 
for (i in 1:length(folders)) {
  path <- list.files(folders[i], pattern = (".tif$"))
  # Remove all before and up to "reflectance_" in gsub
  path <- path[order(gsub(".*reflectance_","",path))][-4]
  #reorder the bands to match those in S2
  paths <- path
  paths[3] <- path[4]
  paths[4] <- path[3]
  temp <- rast(paste0(folders[i],"/",paths))
  names(temp) <- c("b", "g", "r", "nir")
  assign(paste0("v", i), temp) 
}
}else{
  v <- rast(mosaicname)
}

```

We now have all the image missions loaded from corresponding sub-folders, stacked, and dynamically allocated variables i.e. $\text{v1},\text{v2},\dots,\text{v5}$. Resample all the images to 1 m spatial resolution using bilinear approach and mosaic them.

```{r d3}
if(!file.exists(mosaicname)){
  temp <- aggregate(v1[[1]], 9)
  res(temp) <- c(1, 1)
  v1 <- resample(v1, temp, method='bilinear')
  temp <- aggregate(v2[[1]], 9)
  res(temp) <- c(1, 1)
  v2 <- resample(v2, temp, method='bilinear')
  temp <- aggregate(v3[[1]], 9)
  res(temp) <- c(1, 1)
  v3 <- resample(v3, temp, method='bilinear')
  temp <- aggregate(v4[[1]], 9)
  res(temp) <- c(1, 1)
  v4 <- resample(v4, temp, method='bilinear')
  temp <- aggregate(v5[[1]], 9)
  res(temp) <- c(1, 1)
  v5 <- resample(v5, temp, method='bilinear')
}
```

Let's now mosaic the scenes to form one image. We will use median to average out the overlaps. Median is preferred because it has been shown to be robust to outliers compared to the mean.

```{r d4}
if(!file.exists(mosaicname)){
  v <- mosaic(v1, v2, v3, v4, v5, fun="median")
}

```

Save the mosaic to disk.

```{r d5}
mosaicname <- paste0(Path_out,'Mpolonjeni_W1_Mosaic.tif')
if(!file.exists(mosaicname)){
  writeRaster(v, mosaicname)	
}

```

Crop/clip Sentinel 2 image to UAV image extents.

```{r d6}
s <- crop(s, ext(v), snap="near")
```


Display the images side by side.

```{r d7, message = FALSE}
x11()
par(mfrow = c(1, 2)) #c(bottom, left, top, right)
plotRGB(s, r="nir", g="r", b="g", stretch="lin", axes=T, mar = c(4, 5, 1.4, 0.2), main="S2", cex.axis=0.5)
box()
plotRGB(v, r="nir", g="r", b="g", stretch="lin", axes=T, mar = c(4, 5, 1.4, 0.2), main="UAV", cex.axis=0.5)
box()

```

## Training data sampling

Load train data.

```{r v1,message=FALSE}
#ref <- vect(paste0(Root,'Vector/Training_Sites_Mpolonjeni.shp'), "polygons")
ref  <- shapefile(paste0(Root,'Vector/Training_Sites_Mpolonjeni.shp'))
ref <- spTransform(ref, crs(v))
```

Sample points from the polygons (stratified random sampling).

```{r v2}
set.seed(530)
samp <- spsample(ref, 4000, type='stratified')
# add the land cover class to the points
samp$class <- over(samp, ref)$Name
samp$code <- over(samp, ref)$code
knitr::kable(table(samp$class), align = 'l')
sum(table(samp$class))
```

Declare the class names and their number.

```{r v3}
#samp <- spTransform(samp, crs(v))
nClasses <- 8
Classes <- data.frame(classID=c(1:8),class=c('Built_up','Cassava', 'Grass',  'Maize',  'Sorghum', 'Sweet_potato','Trees','Water'))
```

Display S2 image and the training points.

```{r v4}
add_legend <- function(...) {
  opar <- par(fig=c(0, 1, 0, 1), oma=c(0, 0, 0, 0), 
    mar=c(0, 0, 0, 0), new=TRUE)
  on.exit(par(opar))
  plot(0, 0, type='n', bty='n', xaxt='n', yaxt='n')
  legend(...)
}

x11()
#par(mar = c(4, 4, 1.4, 0.1)) #c(bottom, left, top, right)
plotRGB(s, r="nir", g="r", b="g", stretch="lin", axes = TRUE, mar = c(4, 5, 1.4, 0.1))
#points(samp, col="blue", cex=.5)
lines(spTransform(ref, crs(s)), col="blue", lwd=1.5)
add_legend("bottom", legend="Reference data", 
pch=c(0,15), col="blue", horiz=T, bty='n', cex=1.1)

```

Extract UAV and S2 pixels values overlaid by points and split them into training and validation points.

```{r v5, message=FALSE}
#library(spatialEco) 
trainData.v <- extract(brick(v), samp, cellnumbers=F, df=T, sp=T)
knitr::kable(head(trainData.v))
trainData.s <- extract(brick(s), samp, cellnumbers=F, df=T, sp=T)
knitr::kable(head(trainData.s))
library(caTools)
#(NB: 0.4 means 40% for training and 60% for validation)
labels <- as.data.frame(trainData.v[,1])
split <- sample.split(labels, SplitRatio = 0.4)
#UAV
valid.v <- subset(trainData.v, split == FALSE) 
train.v <- subset(trainData.v, split == TRUE) 
knitr::kable(table(train.v$class), align = 'l')
knitr::kable(table(valid.v$class), align = 'l')
#Sentinel 2
valid.s <- subset(trainData.s, split == FALSE) 
train.s <- subset(trainData.s, split == TRUE) 
knitr::kable(table(train.s$class), align = 'l')
knitr::kable(table(valid.s$class), align = 'l')

```

## Maximum Likelihood Classification

Classify UAV and S2 image using MLC algorithm.

```{r mlc1, message=FALSE}
#UAV classification
mlc.uav <- superClass(brick(v), train.v[-2], responseCol = "class",
                model = "mlc", minDist = 1)
val.uav <- validateMap(mlc.uav$map, valid.v[-2], responseCol="class", mode='classification', 
classMapping = mlc.uav$classMapping)
#Sentinel 2 classification
mlc.s <- superClass(brick(s), train.s, responseCol = "class",
                model = "mlc", minDist = 1)
val.s <- validateMap(mlc.s$map, valid.s, responseCol="class", mode='classification', 
classMapping = mlc.s$classMapping)
```

Make a function to display classified images and use it to display the MLC map.

```{r disp}
display <- function(map, method, nClasses){
    x11()
    par(mar = c(7, 2, 1.6, 6)) #c(bottom, left, top, right)
    image(map, col=c("red", "orange", "cyan" , "yellow", "brown", "magenta", "green1","blue"), axes=T, ann=F)
    classes.Palette <- colorRampPalette(c("red", "orange", "cyan" , "yellow", "brown", "magenta", "green1","blue", "white")) 
    add_legend("bottom", legend=c('Built_up','Cassava', 'Grass', 
'Maize',  'Sorghum', 'Sweet_potato','Trees','Water', "No data"), fill=classes.Palette(nClasses+1), ncol=3, bty='n', cex=1.1,  pt.bg = NA)
    title(paste0(method," Classification"))
}
display(mlc.uav$map, "UAV MLC", nClasses)
display(mlc.s$map, "S2 MLC", nClasses)

```

Lets design a function for accuracy assessment.

```{r acc1}
accuracy <- function(val.test){
	assessment.storage <- val.test$performance 
	#print(assessment.storage)
	list_of_datasets <- list("ConfusionMatrix" = as.matrix(assessment.storage$table), 
					"OverallAcc" = as.matrix(assessment.storage$overall), 
					"byClass" = as.matrix(assessment.storage$byClass))
	return(list_of_datasets)
}
 
```

Assess the accuracy of MLC.

```{r acc2}
print('UAV Confusion matrixs')
List.v <- accuracy(val.uav)
knitr::kable(List.v$ConfusionMatrix,align='l')
print('S2 Confusion matrixs')
List.s <- accuracy(val.s)
knitr::kable(List.s$ConfusionMatrix,align='l')

print('Other accuracy measures')
knitr::kable(data.frame(Type=List.v$OverallAcc[,0], UAV=round(List.v$OverallAcc[,1],3),S2=round(List.s$OverallAcc[,1],3)),align='l')

knitr::kable(data.frame(UAV_F1score=round(List.v$byClass[,'F1'],3),S2_F1score=round(List.s$byClass[,'F1'],3)),align='l')

knitr::kable(data.frame(UAV_UserAcc=round(List.v$byClass[,'Precision'],3),S2_UserAcc=round(List.s$byClass[,'Precision'],3)),align='l')

knitr::kable(data.frame(UAV_ProducerAcc=round(List.v$byClass[,'Sensitivity'],3),S2_ProducerAcc=round(List.s$byClass[,'Sensitivity'],3)),align='l')

```

Generally UAV gives better accuracy in most classes except for cassava and maize where Sentinel 2 performs well as observed from F1-score. However, the big question is will fusion improve accuracy? Let's consider a mean fusion approach using random forest predicted/simulated high Sentinel based donwsampled Sentinel 2 data and UAV.

Downsample S2 to UAV 1 m spatial resolution using bilinear. 

```{r f1}
s_r <- resample(s,v,method="bilinear")
```

Now sample points from UAV MLC land-cover RF regression prediction.

```{r f2}
set.seed(530)
points <- spatSample(rast(mlc.uav$map), 3000, "stratified", as.points=T, na.rm=T, values=F)
```

Use the stratified randomly sample points to train and predict simulated high resolution S2.

```{r f3, message=FALSE}
s2_p <- extract(s_r, points, drop=F)
knitr::kable(head(s2_p), align = 'l')
vr_p <- extract(v, points, drop=F)
knitr::kable(head(vr_p), align = 'l')
data <- data.frame(S2=s2_p[,-1], UAV=vr_p[,-1])
library(randomForest)
#S2 predicted high resolution output name
s2h_name <- paste0(Path_out,'Mpolonjeni_S2_higres_Prediction.tif')
if(!file.exists(s2h_name)){
  UAV <- rast(mosaicname)
  names(UAV) <- c("UAV.b", "UAV.g", "UAV.r", "UAV.nir")
  startTime <- Sys.time() 
  cat("Start time", format(startTime),"\n")
    s2h.b <- predict(UAV, randomForest(S2.b~UAV.b, data=data, ntree = 300), na.rm=T)
    s2h.g <- predict(UAV, randomForest(S2.g~UAV.g, data=data, ntree = 300), na.rm=T)
    s2h.r <- predict(UAV, randomForest(S2.r~UAV.r, data=data, ntree = 300), na.rm=T)
    s2h.nir <- predict(UAV, randomForest(S2.nir~UAV.nir, data=data, ntree = 300), na.rm=T)
  timeDiff <- Sys.time() - startTime
  cat("\n Processing time", format(timeDiff), "\n")
}
```

Stack all the S2 bands predictions. 

```{r f4}
if(!file.exists(s2h_name)){
  #Stack them
  s2h <- c(s2h.b,s2h.g,s2h.r,s2h.nir)
  names(s2h) <- c("b", "g","r", "nir")
  s2h
  writeRaster(s2h, s2h_name)	
}else{
  s2h <- rast(s2h_name)
}
```

Fuse the simulated S2 high resolution image with UAV using median.

```{r f5}
s2h.fused <- v
s2h.fused <- mean(s2h, v)
s2h.fused

```

Classify and validate the fused product using MLC.

```{r f6}
trainData.f <- extract(brick(s2h.fused), samp, cellnumbers=F, df=T, sp=T)
knitr::kable(head(trainData.f), align = 'l')
valid.f <- subset(trainData.f, split == FALSE) 
train.f <- subset(trainData.f, split == TRUE) 

mlc.f <- superClass(brick(s2h.fused), train.f, responseCol = "class",
                model = "mlc", minDist = 1)
val.f <- validateMap(mlc.f$map, valid.f, responseCol="class", mode='classification',  classMapping = mlc.f$classMapping)

display(mlc.f$map, "Fused MLC", nClasses)

```

Validate the fused MLC land-cover product.

```{r f7}
List.f <- accuracy(val.f)
print('Accuracy measures comparison')
knitr::kable(data.frame(Type=List.v$OverallAcc[,0], UAV=round(List.v$OverallAcc[,1],3),S2=round(List.s$OverallAcc[,1],3),Fused=round(List.f$OverallAcc[,1],3)),align='l')

knitr::kable(data.frame(UAV_F1score=round(List.v$byClass[,'F1'],3),S2_F1score=round(List.s$byClass[,'F1'],3),Fused_F1score=round(List.f$byClass[,'F1'],3)),align='l')

knitr::kable(data.frame(UAV_UserAcc=round(List.v$byClass[,'Precision'],3),S2_UserAcc=round(List.s$byClass[,'Precision'],3),Fused_UserAcc=round(List.f$byClass[,'Precision'],3)),align='l')

knitr::kable(data.frame(UAV_ProducerAcc=round(List.v$byClass[,'Sensitivity'],3),S2_ProducerAcc=round(List.s$byClass[,'Sensitivity'],3), Fused_ProducerAcc=round(List.f$byClass[,'Sensitivity'],3)), align = 'l')

```

It is clear that fusion improved mapping accuracy of some crops and was definitely better than S2 classification. Do we probably need a better framework to reap the benefits fo fusion? 
Lets try another classification method.

### Random Forest (RF)

Use RF to classify the images (UAV, S2 and fused image) and compare crop mapping accuracy with MLC. Firsts, we evaluate model training using plots of training error and variable importance. Note: the most important variable is one which has the highest increase in Mean Standard Error (MSE) when removed from the model training samples. We explore these plots using the PCA fused image as an example. From Figure \@ref(fig:rf1), the training error reduces with increase in number of trees. We set the number of trees to 250 because it has been established that over 200 trees RF estimates are stable (Hastie, Tibshirani, and Friedman 2011).

```{r rf1, warning=FALSE, fig.cap="F Training Error w.r.t number of trees."}
rf.pca <- randomForest(code~., data=na.omit(as.data.frame(train.f[,-1]@data)), ntree = 250, importance=T)
plot(rf.pca, main='RF Training Error')

#knitr::kable(importance(rf.pca))

```


```{r rf2, warning=FALSE, fig.cap="RF Variable importance b ased on mean decrease in MSE."}
varImpPlot(rf.pca,sort=TRUE, n.var=min(dim(s2h.fused)[3], nrow(rf.pca$importance)), type = 1)

```
Mean Decrease Gini (IncNodePurity) - This is a measure of variable importance based on the Gini impurity index (i.e. node purity\footnote{More useful variables achieve higher increases in node purities.}) used for the calculating the splits in trees. The higher the value of mean decrease accuracy or mean decrease gini score, the higher the importance of the variable to our model.

Estimate crop probabilities and map using RF. We will later use the probabilities in a conditional/markov random fields (CRFs/MRFs) classification framework which considers spatial context.

```{r rf3}
tic("RF mapping duration:")
rf.f.map <- superClass(brick(s2h.fused), train.f, responseCol = "class",
                model = "rf")
toc()
tic("RF probabilities duration:")
rf.f.prob <- superClass(brick(s2h.fused), train.f, responseCol = "class",
                model = "rf", predType="prob")
toc()
#Display RF map.
display(rf.f.map$map, "Fused RF", nClasses)

```


Validate RF fused map using F1-score and compare it with MLC fusion.

```{r}
val.rf.f <- validateMap(rf.f.map$map, valid.f, responseCol="code", mode='classification')

List.rf.f <- accuracy(val.rf.f)
knitr::kable(data.frame(RFfused=round(List.rf.f$byClass[,'F1'],3),
            MLCfused=round(List.f$byClass[,'F1'],3)), align='l', caption = 'F1-score')

```

RF gives a better accuracy on the fused product compared to MLC.

## MRF

Markov Random Fields (MRFs) and Conditional Random Fields (CRFs) techniques are one of the commonly used contextual classification methods. They offer a probabilistic framework that models spatial dependencies of labels (MRFs) with additional dependencies in data in the case of CRFs in a classified image (Kenduiywo 2016).

```{r crf1}
#Load CRF/MRF function
source(paste0("D:/Code/Sleek/CRF_MRF.R"))
WithRef		<- FALSE
beta0		  <- 20
Nneighb		<- 4
bCRF		  <- FALSE #If TRUE MOdel runs CRF

#Take care of NA mask
image <- brick(s2h.fused)
mrf_name <- paste0(Path_out,'Mpolonjeni_fused_MRF_map.tif')
if(!file.exists(mrf_name)){
  rf_MRF <- CRF_MRF(image, WithRef, beta0, Nneighb, rf.f.map$map, rf.f.prob$map, bCRF, nClasses)
  writeRaster(rf_MRF, mrf_name)
}else{
  rf_MRF <- brick(mrf_name)
}

display(rf_MRF, "Fused RF-MRF", nClasses)

val.mrf.f <- validateMap(rf_MRF, valid.f, responseCol="code", mode='classification')

List.mrf.f <- accuracy(val.mrf.f)
knitr::kable(data.frame(MRFfused=round(List.mrf.f$byClass[,'F1'],3),RFfused=round(List.rf.f$byClass[,'F1'],3),
            MLCfused=round(List.f$byClass[,'F1'],3)), align='l', caption = 'F1-score')

```


## References

Breiman, L. Random Forests. *Machine Learning* 45, 5--32 (2001). https://doi.org/10.1023/A:1010933404324

Y. Zou, G. Li and S. Wang, "The Fusion of Satellite and Unmanned Aerial Vehicle (UAV) Imagery for Improving Classification Performance," *IEEE International Conference on Information and Automation (ICIA)*, 2018, pp. 836-841, doi: 10.1109/ICInfA.2018.8812312.

Kenduiywo, B. *Spatial-temporal Dynamic Conditional Random Fields crop type mapping using radar images*. Technische Universitaet Darmstadt Prints (2016)