-
Notifications
You must be signed in to change notification settings - Fork 0
/
MathMethods.tex
52 lines (36 loc) · 4.3 KB
/
MathMethods.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
\section{Mathematical Methods}
\label{sec:math}
A small number of mathematical methods which may not be familiar to the reader are used within this thesis. They are outlined below.
\subsection{K-means Clustering}
K-means clustering is a method for cluster analysis, whereby unsorted data is sorted into $k$ distinct groups based on proximity to evolving anchors. Figure \ref{fig:KM1lr} shows the results of a k-means clustering on unsorted data, with colours indicating output cluster designations. It can be seen that the chosen clusters align well with what may have been chosen by a human observer. In this case the groups are relatively well separated, and so the task to the k-means algorithm was relatively undemanding.
\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/comp/KMeansMarkDemo/1.pdf}
\caption{A reproduction of Figure \ref{fig:KM1} from Page \pageref{fig:KM1}, which shows the results of a k-means clustering of the data from Figure \ref{fig:corrected} from Page \pageref{fig:corrected} (which shows the \emph{actual} groups.}
\label{fig:KM1lr}
\end{figure}
The standard algorithm proceeds as follows:
\begin{enumerate}
\item $k$ random starting positions are assigned.
\item Each data-point is assigned a group based on which starting position is closest.
\item The mean location of the data-points in each group is computed.
\item Steps 2 and 3 are repeated, using the computed means instead of the original starting positions, until a stage where an iteration of these two steps results in no data-point changing groups. Once this occurs, the algorithm is said to have converged, and the resulting groupings are output.
\end{enumerate}
\subsection{Principal Component Analysis}
\textit{A valuable primer on \acrshort{PCA} in relation to colour technology is available from \citet{tzeng_review_2005}.}
\bigskip
\glsreset{PCA}
\Gls{PCA} is a dimensionality reduction method, used to to reduce the number of variables within a dataset whilst retaining as much of the variance as possible, and often used to identify the correlated roots of variance.
\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LitRev/PCA.png}
\caption{Two-dimesional data with the first and second principal components added. Source: \url{https://commons.wikimedia.org/wiki/File:GaussianScatterPCA.svg}}
\label{fig:PCA}
\end{figure}
Along with similar techniques (such as single value decomposition), it is used extensively within the study of daylight \glspl{SPD} \citep{hernandez-andres_color_2001,ojeda_influence_2012,pant_estimating_2009,bui_group_2004,judd_spectral_1964,maloney_computational_1984,spitschan_variation_2016} and natural \glspl{SRF} \citep{maloney_computational_1984,dzmura_color_1992,maloney_evaluation_1986,maloney_color_1986,cohen_dependency_1964,ferrero_principal_2011,zhang_reconstructing_2008,kwon_surface_2007,agahian_reconstruction_2008,harifi_recovery_2008,parkkinen_characteristic_1989,vrhel_color_1992,fairman_principal_2004,ayala_use_2006,eem_reconstruction_1994-2,connah_multispectral_2006,shi_using_2002,morovic_metamer-set-based_2006}.
%A key goal for many of the researchers listed above... - how many dimensions/sensors do we need
In eras where the transmission of large datasets was troublesome, dimensionality reduction methods such as \gls{PCA} held value as a method to summarise a dataset, with the understanding that a reader could reconstruct a pseudo-dataset with minimal data-loss from the provided principal components. One such example is the work of \citet{judd_spectral_1964} where only the mean and first four characteristic vectors are provided\footnote{This is not technically \gls{PCA} but the related technique of \citet{morris_objective_1954}.}. The data from this study is replotted in Figure \ref{fig:Judd} and it can be seen that for their dataset this description does seem to offer a sensible summary of the data: the shape of the mean can be seen, and it can be seen that further variation principally occurs as a result of $V_{2}$ which is a broad and relatively monotonic function, indicating that changes are likely to be a skewing of the spectral shape.
\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LitRev/Judd.pdf}
\caption{The mean and first four characteristic vectors of \citet{judd_spectral_1964}.}
\label{fig:Judd}
\end{figure}
\clearpage