LargeSphere.tex

\chapter{Large Sphere Experiment} %new name?
\label{chap:LargeSphere}

\textit{The work presented here has been presented as an oral presentation at AIC 2013 \citep{macdonald_chromatic_2013} \textbf{prior to the author's involvement}, and as an oral presentation at ICVS 2017 \citep{garside_investigations_2017} by the author.}

Code and data is provided: \url{https://github.com/da5nsy/LargeSphere}

\section{Summary}

The goal of this experimental work was to examine the effect of different wavelengths of light upon chromatic adaptation. Our hypothesis was that \gls{ipRGC} stimulation may need to be considered in order to fully model the induced adaptation, with the null hypothesis being that chromatic adaptation can be fully accounted for by cone and rod mechanisms. If evidence of a melanopic input to chromatic adaptation was found, it may help to explain conflicting results in previous experiments which sought a `preferred \gls{CCT}', which may in turn allow for control of \gls{CCT} in museums to be used more extensively as a means to control damage to objects. 

This experiment is of a similar type to those discussed in Section \ref{sec:aadi}. Within a Ganzfeld viewing environment, illuminated by one of 16 different wavelengths of near-monochromatic light, observers performed an achromatic setting task, controlling the chromaticity of a display visible in the central field through a 4$^{\circ}$ circular aperture with two handheld sliders. Under these conditions it would be expected that an observer's chosen achromatic point would correspond in hue to the adapting field, and be of a saturation somewhere between a nominal objective white point and the adapting stimulus. If melanopsin were involved in chromatic adaptation we may expect unusual results for the part of the spectrum that melanopsin is most sensitive to (roughly 480nm).

Two different analyses were performed, with neither providing comprehensive support for rejecting the null hypothesis. However, it is noted that several assumptions are implicit in the experimental design, and that the experiment samples only a small region of the potential search space for melanopic input to adaptation (See Section \ref{sec:LSdis}).

This project was designed before the author arrived at \gls{UCL}, and data from two participants had already been collected. Data collection required at least 16 hours commitment from observers, and so the only observers up to that point had been LM (one of the author's academic supervisors), who initiated the experiment, and TR who was a student in the Medical Physics department. The original goal for my involvement in this project was that I should be a third observer, and assist in the data analysis. However, following the collection and initial data analysis of my own data, it appeared that there had been a technical fault during this run of data collection, and my data was deemed corrupted. Thus, my only contribution to this work is an extension to the data analysis started by LM, upon which I shall focus on in this chapter. The issue does not seem to have affected the data from the other two observers.

\section{Methodology}

\subsection{Hardware}

A hollow fibreglass sphere of approximately 750mm diameter was prepared with three holes; the first for an observer's face, the second (above the observer's head) for a light source to illuminate the Ganzfeld, and the third (opposite the first) through which a small portion of an LCD screen could be seen (A Fujitsu-Siemens SCALEOVIEW D19-1, SN:YE5L006100). A schematic is shown in Figure \ref{fig:sketch}.

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/sketch.png}
\caption{The hardware design. Illustration courtesy of Lindsay MacDonald.}
\label{fig:sketch}
\end{figure}

The screen was characterised such that calibrated stimuli could be generated. This was done by taking spectral measurements at 21 levels of pixel value for each channel, and later interpolating to create a full look-up-table to transform from pixel value to XYZ values and vice versa.

The interior of the sphere was painted with RAL 7040 dulux vinyl matt grey, of approximately 38\% reflectance. 

Illumination was provided by a Kodak slide projector with a tungsten-halogen light source, filtered through one of 16 near-monochromatic filters, ranging in 20nm intervals from 400-700nm inclusive (Figure \ref{fig:LSillum}). During each session, only one filter was used and the surround illumination remained the same throughout. Measurements of the internal illumination were taken with a \gls{PR650} device, plotted in Figure \ref{fig:LSillum}. The illumination has been described further in \citet{macdonald_chromatic_2013}: ``The average luminance of the surrounding chromatic adapting field ranged from a maximum of 0.75 cd/m2 at 560 nm to less than 0.05 cd/m2 at the ends of the spectrum, corresponding to a retinal illuminance through a pupil of diameter 8mm ranging from 38 trolands (max) to less than 2.5 trolands, meaning that the viewing environment was in the upper mesopic range''. No effort was made to ensure that light from inside the sphere did not hit and/or reflect from the LCD display. 

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/LSillum.pdf}
\caption{The illuminations within the sphere, created by filtering light from a slide projector, measured as reflecting from a point just to the right of the aperture through which an observer would view the screen. Measurements made by Lindsay MacDonald.}
\label{fig:LSillum}
\end{figure}

\subsection{Observer task}

The observer sat on one side of the sphere with their face inside the sphere (as shown in Figure \ref{fig:Alejandro}), such that nothing outside of the sphere was visible. On view on the opposite side of the sphere was a circular 4$^{\circ}$ aperture onto an LCD screen, upon which a random colour was visible (See section \ref{sec:LSstim} for further details of the randomisation starting routine). Surrounding the aperture, the rest of the ganzfeld was illuminated by light from the slide projector, filtered through one of 16 near-monochromatic filters (Figure \ref{fig:LSillum}). It was the observer's task to use two handheld sliders, which controlled the chromaticity of the screen, to make the appearance of the screen achromatic (an achromatic setting task). 

On average it took observers roughly 20 seconds to make a selection. Once the observer was happy with the achromacy of the patch, a button was pressed to record the setting and a new random colour would be presented. The first displayed colour was at \gls{CIE} L* (of CIELAB and CIELUV) of 85, with subsequent colours descending by 5 L* until 10 L*. 

This sequence was repeated 10 times per session. Per session observers made 10 selections at 16 lightness levels (160 total). Observers performed 16 sessions (2560 selections total), one session for each surround adapting wavelength. The overall protocol is visualised in Figure \ref{fig:ExperimentalPro}. Observers found sessions quite fatiguing and generally did not wish to do more than two or three sessions per day. A brief break was generally taken between sessions, though no minimum time for such was prescribed.

For one observer, in an additional (17th) session the narrow-band filter was replaced by a neutral density filter, to produce an achromatic adapting field.

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/Alejandro.jpg}
\caption{An observer sat at the sphere. Photograph courtesy of Lindsay MacDonald.}
\label{fig:Alejandro}
\end{figure}

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/ExperimentalPro.png}
\caption{The experimental protocol.}
\label{fig:ExperimentalPro}
\end{figure}

\subsection{Stimulus specification}
\label{sec:LSstim}

The stimulus was controlled via a MATLAB script, which read the input of two sliders and a button via a `Phidget' interface. The two linear sliders provided values of between 0 and 1000, and these values were converted to approximate CIELAB co-ordinates in such a manner that the slider maxima corresponded to the Natural Colour System (NCS) unique hue positions as computed by \citet{derefeldt_transformation_1986}. In this manner the sliders could be considered as moving along opponent axes between red and green (via the CIELAB origin), and blue and yellow (via the CIELAB origin) respectively. 

These values were transformed into XYZ values, with white references of [XYZ = 99.04, 100, 151.30] for observer LM and [XYZ = 94.97, 100, 98.15] for observer TR. The white reference for LM related to a screen characterisation performed around the time that the initial measurements were made. It is assumed that the same is true of the white point used for observer TR, although this characterisation data is no longer available. These values were then converted to sRGB values, and output to screen.

The generation of random starting colours was achieved by modulating the nominal zero-point on each slider scale, where the default zero point is considered as 500, sampling from a uniform distribution between 250 and 750 for each presentation. The slider position was then considered relative to this new zero-point. The effect of this can be considered as such: if the observer were to leave the slider in a central position throughout (make no selections) they would be provided with random colours drawn uniformly from within bounds either side of the objective white-point. Since, hopefully, the observer was making selections, each new `random' colour would be biased towards the previous selection (based on where the sliders had been left from the previous selection). It should be noted that this new colour was not entirely independent of the previous selection, due to this bias.

% \subsection{Data Collection}

% Data was collected for two observers \dots

\subsection{Data Processing}

Data were calibrated in the following manner: the recorded RGB values of the observers' selections were bounded (values above 1 or below 0, which occurred when observers made selections which were outside of the sRGB gamut, were brought within range, with an `absolute' rendering intent), quantized to 8-bits, and converted via look-up-table to \gls{CIE} 1931 XYZ tristimulus values. From these, xy chromaticities and CIELAB values were computed (with the white point of the display, as loaded from the characterisation file, used as the white reference).

A set of data referred to as the `baseline' data was generated, where there was no observer input, to consider the range of possible responses that an observer could make. The element of code which provided new random starting positions was excluded for these sessions. This data was processed in exactly the same manner as the observer data, and is shown in Figure \ref{fig:overviewBL}, where it can be seen that the chromatic gamut increases as L* increases (due to a factor \texttt{cfac} in the display code which aimed to scale chromatic space with L*, mirroring the shape of CIELAB). It can also be seen that the gamut boundary is sometimes reached at higher levels of L*, with some of the vectors curving to remain within gamut.

\begin{figure}[htbp]
\includegraphics[max width=1.2\textwidth, center]{figs/LargeSphere/baselinedataOverview.pdf}
\caption{The baseline dataset. This data represents the condition where there is no observer input. The points on the curves are the different values of L* presented during a session. Each curve represents a single `session'. The two sliders are left at their maximum (1000), minimum (0), or neutral (500) positions (9 combinations). This was done computationally; no actual slider input was used. In the legend the first number denotes the slider A position, and the second denotes the slider B position.}
\label{fig:overviewBL}
\end{figure}

%The effect of the random offsetting was also queried, and it was found that the maximum random offset pushed the chromaticities halfway between \dots

Two distinct approaches were taken to data analysis. The first attempted to process the data in a chromatic space, with the reasoning that under the null condition chromatic selections should simply correspond to the chromaticity of the surround adapting illuminations, presumably with some sort of gain function applied. If it could be shown that this relationship was not as expected, in a manner which might suggest involvement by other mechanisms (meaning rods or \glspl{ipRGC}), then this could be considered as evidence against the null hypothesis.

The second approach took advantage of the fact that measurements were taken at samples across the wavelength spectrum. Since we know the power at each waveband, and the spectral sensitivities of the receptors, we can see whether the responses (transformed to cone space) relate in some simple way to spectral sensitivities.

%Here the logic was that if the null condition were true, we should be able to fit a model to observer responses which only used cone-based inputs, and we could carefully consider the (presumed) benefit of including rods and \glspl{ipRGC} in the model. If either rod input or \gls{ipRGC} input were found to dramatically improve the model then this could be considered as evidence against the null hypothesis.

\section{Results}

\subsection{Summary}

Visually summarising the results for even a single observer is difficult due to the high number of variables and collected datapoints. In the following plots data is averaged over the 10 runs within each session (under each adapting wavelength). In Figures \ref{fig:overviewLM} - \ref{fig:overviewDG} data from LM, TR and the author (DG) is presented.

\begin{figure}[htbp]
\includegraphics[max width=1.2\textwidth, center]{figs/LargeSphere/LMdataOverview.pdf}
\caption{The dataset of observer LM. In all plots, the average CIELAB values are taken over the 10 repeats within each session. \emph{Top left:} an overview of the chromaticity of selected points. Connections between points indicate that they are from the same run (under the same adapting wavelength). Colouring of points and lines indicates the adapting wavelength, though whilst there is an approximate correspondence between the colour of the line and the appearance of the adapting field this is only meant as a means to differentiate between the different lines and is not in any way an accurate representation of appearance. \emph{Top right:} as for top left but only for points recorded at specific values of L*. Colours are as for top left, but lines linking points now represent points of like L*. This plot is included to show the relationship across wavelength of the adapting field. \emph{Lower left and right:} Other perspectives upon the CIELAB projection. Data as for top left.}
\label{fig:overviewLM}
\end{figure}

\begin{figure}[htbp]
\includegraphics[max width=1.2\textwidth, center]{figs/LargeSphere/TRdataOverview.pdf}
\caption{As per Figure \ref{fig:overviewLM}, but with the data of observer TR.}
\label{fig:overviewTR}
\end{figure}

\begin{figure}[htbp]
\includegraphics[max width=1.2\textwidth, center]{figs/LargeSphere/DGdataOverview.pdf}
\caption{As per Figure \ref{fig:overviewLM}, but with the data of observer DG.}
\label{fig:overviewDG}
\end{figure}

It can be seen in the data for all observers that there is a clear L*-dependent shift. Interestingly, the precise nature of this seems different for each observer; the data of LM clusters tightly around [0,0] for low values of L* and then generally moves north-west as L* increases, before returning towards the origin at the highest values of L*. For TR the shift is monotonic and roughly south-west.

Within this pattern it can be seen for observers LM and TR that there is a causal relationship between the adapting wavelength and the selected chromaticity. This is most easily seen in the top right plots of Figures \ref{fig:overviewLM} and \ref{fig:overviewTR}. In both cases a rough circle can be seen for both of the values of L* shown.

In Figure \ref{fig:overviewDG} it can be seen for observer DG in the lower-right plot that the data splits into two distinct groups, one of which is considerably lower in b* than is seen for either of the previous observers, or would generally be expected. It is suspected that there was a screen calibration issue which led to offsetting of the screen output in an unpredictable session-by-session fashion. It was found that a basic offsetting applied selectively to those sessions which appeared to be affected could `correct' for this issue, but without a better understanding of the issue this dataset is considered unsuitable for further analysis.

\subsection{Variability over time/repeats}

Not represented in the above plots is the way in which responses varied over time within each session, averaged over L* (the previous plots were averages over time). Figures \ref{fig:timeLM} and \ref{fig:timeTR} show the calibrated CIELAB values for observers LM and TR. 

L* should remain steady throughout (this was set), with minor differences introduced presumably due to differences between the screen and sRGB, 8-bit quantization, and any selections where a gamut boundary was reached. Both a* and b* follow the broad trends which would be expected given previous figures. Newly visible in these figures is the manner in which responses change over time. For both observers LM and TR the first two or three repeats seem to be distinct from the rest of the set, suggesting that adaptation had not yet reached a steady-state during this time. Further quantitative analysis is provided in the following section.

\begin{figure}[htbp]
\includegraphics[max width=1.2\textwidth, center]{figs/LargeSphere/LMdataOverTime.pdf}
\caption{CIELAB co-ordinates across time (repeat number) for observer LM. Top plot is L*, middle a* and lower b*.}
\label{fig:timeLM}
\end{figure}

\begin{figure}[htbp]
\includegraphics[max width=1.2\textwidth, center]{figs/LargeSphere/TRdataOverTime.pdf}
\caption{As per Figure \ref{fig:timeLM}, but with the data of observer TR.}
\label{fig:timeTR}
\end{figure}


\clearpage


\subsection{Chromaticity-based analysis}

The CIELAB co-ordinates for the adapting fields were computed from the measurements shown in Figure \ref{fig:LSillum}, relative to the white point of the display for observer TR, and are presented in Figure \ref{fig:adapter1}. 

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/adapter1.pdf}
\caption{The CIELAB values for the surround adapting illuminations, calculated from the measurements shown in Figure \ref{fig:LSillum}, taking the white point of the screen (for the TR trials) as the white point.}
\label{fig:adapter1}
\end{figure}

If the surround fully controlled adaptation, and observers were fully adapted, it would be expected that observers would select the chromaticity of the surround as their neutral chromaticity. It is unlikely that either of these statements is correct, but we would still expect selections to be impacted by the chromaticity of the surrounds to some extent. If we re-plot the selected L* values from the top-right sub-figures of Figures \ref{fig:overviewLM} and \ref{fig:overviewTR} atop the data of Figure \ref{fig:adapter1}, we can visualise the correspondence between the observer selections and the surrounds\footnote{Note that the data of Figure \ref{fig:adapter1} is transformed for Figure \ref{fig:LMCompSurr}. The same measurements data was used for the surrounds, but the normalisation factor was different since the white point of the display was used when calculating CIELAB co-ordinates.}.

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/LMcompareWithSurround.pdf}
\caption{The CIELAB co-ordinates for the surround illuminations, relative to the white point used in the LM trials (black line), and the data from the top right sub-figure of Figure \ref{fig:overviewLM}, showing the CIELAB co-ordinates of the observer selections for 20 L* and 60 L* (dashed and dotted grey lines respectively).}
\label{fig:LMCompSurr}
\end{figure}

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/TRcompareWithSurround.pdf}
\caption{As per Figure \ref{fig:LMCompSurr} but for the data of TR. Note that the white points used for visualisation are the white points used during data collection, which differ for each observer.}
\label{fig:TRCompSurr}
\end{figure}

For both observers we see that the pattern of responses corresponds very well to the pattern of the chromaticities of the surrounds. For both observers there is a scaling and offset but considering the L* dependent shifts seen previously, and the way in which isoluminant planes through CIELAB change shape with changing values of L* this is to be expected.

Whilst there is some marked variation from a perfect replication of the surround CIELAB co-ordinates - for example the 500nm and 600nm points for LM, and the 520nm and 540nm points for TR, these variations seem to be in line with the level of noise in the responses, and attribution to a specific cause (such as melanopsin or rod input) cannot easily be achieved, since many variables are confounded.

%Based on this analysis I do not find sufficient evidence to reject the null hypothesis (that adaptation can be fully accounted for by considering cone and rod input alone).

\subsubsection{Colour Constancy Indices}

\Glspl{CCI} can be calculated by comparing the distance between a pre-adaptation point (generally some sort of objective white point) and the post-adaptation point (the participant-selected achromatic point, or some average thereof), with the distance between the pre-adaptation point and the nominal `ideal' match (which in this case would be the chromaticity of the surround adapting field). The \gls{CCI} is calculated as:

\begin{equation}
CCI = 1-b/a
\label{eq:CCI}
\end{equation}

where $b$ is the distance between the post-adaptation point and the ideal match, and $a$ is the distance between the pre-adaptation point and the ideal match\footnote{For further discussion see \citet[Section 4.1, pg. 681]{foster_color_2011}.}.

There are multiple reasonable options for which value to use as a `pre-adaptation point'. First, the origin of the space within which selections are made (different for each observer) seems to be a possible option; this corresponds to the central point on each slider over time for each individual. However, though the set-up ascribes some value to this point, it is not definitively linked to the settings that observers made; it can be seen in Figures \ref{fig:LMCompSurr} and \ref{fig:TRCompSurr} that there seems to be no particular relevance of the point [0,0]. A second option would be to use the measurements made under a neutral density filter. However, again there is no actual significance of these values - a neutral density filter could be slightly chromatic and still be labelled as a neutral density filter, and even if it were perfectly spectrally neutral, its designation as a gold standard `neutral' only actually passes on responsibility to the chromaticity of the projector lamp, which is under no obligation to be especially `neutral'. The third option is to use the average setting value, which has no specific logical background, but is vastly more practically relevant than the previous two options. This third option was chosen for future analyses.

Averaging over time for each observer, and using the average response for each observer as the pre-adaptation point yields \glspl{CCI} as shown in Figures \ref{fig:LMCCI} and \ref{fig:TRCCI}. Only data for L* of 20 and 60 is plotted for clarity, in keeping with previous figures. It can be seen that there are common trends across wavelength at the different values of L*.

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/LMCCI.pdf}
\caption{\Glspl{CCI} for observer LM, calculated as per Equation \ref{eq:CCI} for the data of LM shown in Figure \ref{fig:LMCompSurr}. Error bars are standard deviation.}
\label{fig:LMCCI}
\end{figure}

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/TRCCI.pdf}
\caption{As per Figure \ref{fig:LMCCI} but for observer TR.}
\label{fig:TRCCI}
\end{figure}

It is highly unusual for values of \gls{CCI} to be below 0; this indicates that the selected post-adaptation point is further from the pre-adaptation point than the ideal match. Normally, assuming that adaptation occurred on the same vector as that connecting the neutral point and the ideal match, this would mean that the observer had \emph{over}-adapted, something which is very unusual. Additionally, results like this generally aren't seen because observers are adapted to highly saturated adapters, often on/near the spectral locus, and colours outside of this simply don't exist to be chosen (in a linear space). 

We see such results here for a number of reasons. Firstly, we are not in a linear space. In CIELAB the chromatic gamut increases as a function of L*, meaning that a high L* value can be outside of the gamut of a set of low L* primaries. Secondly, in concert with this non-linearity, the slider ranges were fixed to represent a broader range of a* and b* values at higher L* values (otherwise it would have felt as though a specific movement at a low value of L* would have resulted in a much greater chromatic shift than that same movement would have done for higher values of L*). The effect of this is visualised in Figure \ref{fig:overviewBL}.

Additionally, it appears as though there is substantial offsetting and L* dependent shifting, which call into question the appropriateness of such a metric. It should also be noted that in this current analysis, averages are taken over time, which obscures and adopts any underlying trends which time may influence. It seems as though there is a risk of obscuring more than is revealed through use of such a metric.

However, it is curious to see a dip in the results for both observers at 500nm, which is roughly where we might expect to see an effect should there be an effect of melanopsin (which theoretically peaks at around 480nm, but is predicted to have an increased value of peak sensitivity as a function of pre-receptoral filtering). Based on Figures \ref{fig:LMCompSurr} and \ref{fig:TRCompSurr} this was not anticipated. However, without a clear prediction for what effect we would expect melanopsin to have (in terms of the direction or magnitude of effect) I suggest caution in interpreting this as evidence of an effect. This peak could also be the result of rod-based intrusions (the peak of the rod \gls{SSF} is 507nm). It is unclear what magnitude and vector of effect should be expected from rod intrusion.

Averaging over wavelength and time allows us to visualise the effect of L*. Here, instead of calculating the \gls{CCI}, a simpler measure is used: the distance from the pre-adaptation point (the average of all recorded achromatic points, per observer) to the post-adaptation point. This gives us a more direct impression of the extent of adaptation, without the assumption of adaptation vector angle. In the context of Equation \ref{eq:CCI} this value could be denoted $c$, as it represents the final side of the triangle $abc$. 

It is assumed, based on the analysis presented in Figure \ref{fig:overviewBL}, that as L* increases, the length of these vectors shall increase, simply as a result of the experimental set-up. This is shown to be the case in Figures \ref{fig:LMCCI_L} and \ref{fig:TRCCI_L}, with near monotonic increases as L* increases for both observers.

\begin{figure}[htbp]
\includegraphics[max width=0.8\textwidth]{figs/LargeSphere/LMCCI_L.pdf}
\caption{Average distance (in the a*b* plane of CIELAB) between pre-adaptation point and post-adaptation point for different values of L* for observer LM. Error bars are standard deviation.}
\label{fig:LMCCI_L}
\end{figure}

\begin{figure}[htbp]
\includegraphics[max width=0.8\textwidth]{figs/LargeSphere/TRCCI_L.pdf}
\caption{As per Figure \ref{fig:LMCCI_L} but for observer TR.}
\label{fig:TRCCI_L}
\end{figure}

Averaging over wavelength and L* allows us to visualise adaptation over time. Again, we would assume that as time increases (technically, we use repeat number here as a rough surrogate of time) we should expect to see an increase in the vector distance between pre-adapt and post-adapt. It is possible that a hint of this trend is visible in the data presented in Figures \ref{fig:LMCCI_T} and \ref{fig:TRCCI_T} but the level of noise is very high as can be seen from the measures of standard deviation.

\begin{figure}[htbp]
\includegraphics[max width=0.8\textwidth]{figs/LargeSphere/LMCCI_T.pdf}
\caption{Average distance (in the a*b* plane of CIELAB) between pre-adaptation point and post-adaptation point for different repeat numbers for observer LM. Error bars are standard deviation.}
\label{fig:LMCCI_T}
\end{figure}

\begin{figure}[htbp]
\includegraphics[max width=0.8\textwidth]{figs/LargeSphere/TRCCI_T.pdf}
\caption{As per Figure \ref{fig:LMCCI_T} but for observer TR.}
\label{fig:TRCCI_T}
\end{figure}

A three-way ANOVA performed upon the data, treating wavelength, time and L* as independent categorical variables found a significant effect of each, as shown in Figures \ref{fig:anova} and \ref{fig:anova2}, with a level of $\alpha$ of 0.05.

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/anova.png}
\caption{The multi-way ANOVA output table for the LM data, where X1 is wavelength, X2 is repeat number (time), and X3 is L*.}
\label{fig:anova}
\end{figure}

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/anova2.png}
\caption{As per Figure \ref{fig:anova} but for observer TR.}
\label{fig:anova2}
\end{figure}

Whilst variables were treated as categorical in the above analysis, there would be an argument for treating each as a continuous variable. However, several factors would need to be considered. Foremost, whilst wavelength is nominally a continuous variable, in this experiment each wavelength category had a different level of radiance, which means that caution should be taken in assuming their equivalence. It is also possible that each filter might have a meaningfully different spectral transmission profile, specifically the band-pass width (see Figure \ref{fig:LSillum}).

Caution is also required regarding the assumption of independence of measurements. Due to the nature of chromatic adaptation, it would not be possible to interleave conditions, and so wavelength is further confounded with various other factors; date, time of day, and all manner of secondary factors relating to these (whether the observer has eaten recently for example).

It would be of interest to assess whether the contrast between the surround and the selection area influenced settings, but for this specific experiment the contrast is confounded by wavelength and L* and there does not seem to be a clear way to examine the influence of contrast directly.

% But luminance \dots

% \begin{figure}[htbp]
% \includegraphics[max width=\textwidth]{figs/LargeSphere/adapter2.pdf}
% \caption{As per Figure \ref{fig:adapter1}, but different perspectives upon the three-dimensional CIELAB space.}
% \label{fig:adapter2}
% \end{figure}

\subsection{Spectrum-based analysis}

This analysis aimed to leverage the fact that we have access to estimates of the spectral emission of the screen for each RGB value, and thus can calculate the relative cone/rod/\gls{ipRGC} catches for each achromatic setting. This in turn allows us to test to what extent the results seen can be explained simply by scaling cone mechanisms (a \emph{diagonal}, or Von-Kries-type transformation) and to ask whether adding additional inputs to the model (rods and/or \glspl{ipRGC}) improves our ability to explain the measured results.

The first stage of this analysis was to generate simulated data which represented the situation whereby there was only simple Von Kries adaptation. Under this situation the estimated cone catches for the achromatic matches would be equal to the \glspl{SSF} of the cones, linearly scaled by the radiance levels of each surround adapting field. If the radiance levels were equal at each wavelength interval, the simulated data would be equal to the cone \glspl{SSF}. If, hypothetically, one wavelength interval were vastly higher in radiance, we would expect a correspondingly higher adaptive effect, which would result in a higher level of activation required in order for an achromatic visual appearance. In this way, we predict what results may look like if the only type of adaptation occurring was a simple Von Kries / diagonal scaling. 

Practically, this is accomplished by element-wise multiplication of each sensor \gls{SSF} by the measured emission from each adapting surround, as per Equation \ref{eq:VK}. The absolute scaling, and the relative inter-sensor scaling, is irrelevant due to the freedom that will be allowed later in the analysis.

\begin{equation}
s_i = p_i \odot e
\label{eq:VK}
\end{equation}

where $s$ is the simulated required sensor catch for achromacy (with the index $i$ denoting the sensor), $p$ is the sensor \gls{SSF}, and $e$ is the measured emission from each adapting surround. The CIE 2006 10$^{\circ}$ observer fundamentals were used, and the results are visualised in Figure \ref{fig:LSsimdata}.

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/LSsimdata.pdf}
\caption{Simulated data for a basic Von Kries observer. In this figure white represents a high response. For example, with 560nm peripheral stimulation we would expect an observer to pick a colour with high L-cone activation as achromatic, assuming that the sensitivity of L-cones had been suppressed and thus higher activation was required to reach a neutral point (see peak roughly in the centre of the top bar).}
\label{fig:LSsimdata}
\end{figure}

A comparison was then made between this data and a set of real data (Obs = TR, averaged over time (entire run, no exclusions), averaged over L* = 35:60). This real data had been transformed from the recorded RGB values of achromatic matches into LMS values. It can be seen that there is a considerable difference between the simulated data and the real data (Figure \ref{fig:simVreal}). S-cone data shows the closest match, with the predicted peak at 460nm\footnote{Note that this is not at the peak sensitivity of s-cones (which would appear at 440nm for this dataset at 20nm intervals) but rather at the peak of the s-cone sensitivity function multiplied by the \gls{SPD}, as plotted in Figure \ref{fig:LSsimdata}.} being mirrored in the real data. This peak appears to bleed into the (real) M-cone data, and the simulated data for L and M-cone data shows very little correspondence to the collected data. Correlation coefficients between the simulated and this specific real dataset are 0.1399, -0.2509, 0.3164 for L, M and S respectively.

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/simVreal.pdf}
\caption{A comparison of simulated data (top row) and real data (bottom row). Data from the top row is as per the top three bars of Figure \ref{fig:LSsimdata}.}
\label{fig:simVreal}
\end{figure}

In order to understand the way in which adaptation may be crossing between channels, or the way in which we may have not properly isolated our channels (it is unclear exactly how much freedom an observer truly has to move around the response space) a brute-force method was used to find combinations of the above simulated data which would best fit the real data.

10000 random sets of weighting values (30000 values total) between -25 and 25\footnote{An analysis showed that the absolute range of these figures was unimportant, since we were looking for correlation with the real data rather than absolute correspondence. Thus they are listed here only to assist the reader in understanding graphs such as Figure \ref{fig:contributions_3}.} were generated. These weightings were applied to the simulated responses and the results were additively combined\footnote{(X amount of simulated L) + (Y amount of simulated M) + (Z amount of simulated S)}. The correlation between this new random combination and each channel of the real data was computed. The top performing randomly generated combinations were selected and are presented in Figure \ref{fig:maxsimVreal}. These particular combinations were created through cross-combining the original simulated data (top row of Figure \ref{fig:simVreal}), in the ratios shown in Table \ref{tab:crosscomb} and correlated with the real data to extent of the following coefficients: 0.9126, 0.8861, and 0.7726 for L, M and S respectively. These are much improved over the coefficients for the original simulated data.


\begin{table}[hbtp]
\centering
\begin{tabular}{|r|r|r|r|}
\hline
 & L & M & S \\ \hline
L & $18.4069$ & $-23.2578$ & $-10.9817$ \\ \hline
M & $-13.0477$ & $9.8327$ & $10.6844$ \\ \hline
S & $-2.7633$ & $-17.6798$ & $8.3036$ \\ \hline
\end{tabular} % Would be nice to format to fit page width
\caption{Optimal weights to fit the specific real dataset used. \\ \emph{Example: Image in top left of Figure \ref{fig:maxsimVreal} (L) was created by combining 18.4069 * the original simulated L (Top left of Figure \ref{fig:simVreal}), -23.2578 * the original simulated M and -10.9817 * the original simulated S.}}
\label{tab:crosscomb}
\end{table}

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/maxsimVreal.pdf}
\caption{A comparison of randomly generated combinations (top row) whereby channels were freely mixed from basic Von Kries simulated data (top row of Figure \ref{fig:simVreal}) to best correlate with real data, and real data (bottom row) (repeated from Figure \ref{fig:simVreal}).}
\label{fig:maxsimVreal}
\end{figure}

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/contributions_3.pdf}
\caption{Top 0.2\% performing randomly generated combinations, presented in terms of the weights of the original simulated data that they use. The subfigure on the left represents the weights needed to reconstruct the real data for L, the middle - M, and the right - S. Colour coded such that dark blue is the highest performing and yellow is the worst performing (of this highly performing subset).}
\label{fig:contributions_3}
\end{figure}

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/contributions_all.pdf}
\caption{As per Figure \ref{fig:contributions_3} but for all randomly generated combinations. The trends seen in Figure \ref{fig:contributions_3} are visible, and the range of these trends (extending into the poorer performing randomly generated combinations) can be seen. The colours are rescaled such that yellow now represents the worst performing randomly generated combinations of the entire set. Plots are plotted ordered by success and so lines representing successful randomly generated combinations will overlay lines representing poorer performing ones.}
\label{fig:contributions_all}
\end{figure}

The top performing 0.2\% of the randomly generated combinations are presented in terms of their components (analogous to plotting the values in \ref{tab:crosscomb}) in Figure \ref{fig:contributions_3}, and the entirety of the results for the randomised sampling presented in Figure \ref{fig:contributions_all}. 

It can be seen that to reconstruct the real data for L, a high amount of simulated L, and a low amount of both simulated M and S are required, though from Figure \ref{fig:contributions_all} it can be seen that the requirement for low S is less stringent. It can also be seen that the amount of L required seems related to the amount of M required (from the way in which the lines cross at a point).

A similar but opposite trend is visible for M.

For S, there is a narrow range of successful values for L (negative but close to 0), a larger range of strongly negative values for M, and a range of positive values for S. The reciprocal relationship between L and M seen in the reconstructions of L and M is no longer visible, but instead there is a new reciprocal relationship visible between M and S, though examination of Figure \ref{fig:contributions_all} suggests that this is not as important as in the case of L and M.

It is reassuring that in each case, successful random combinations used high positive levels of the target signal. For both L and S the target signal was the only positive weighting, with M taking positive weights of both M and S. 

\subsubsection{Adding rods and ipRGCs}

In order to investigate whether rods or \glspl{ipRGC} were playing a role in adaptation as measured by this dataset, the analysis was re-run with the additional rod input, additional \gls{ipRGC} input, and both rods and \glspl{ipRGC} as additional inputs. See Figure \ref{fig:LSsimdata} for a visualisation of these channels. The results of re-running the computations following the addition of these inputs is shown in Table \ref{tab:plusres}. Minor increases in correlation are exhibited. However, it should be noted that one would expect to see at least a minor increase in performance from practically any additional signal, so long as it was independent from the already accessible signals.

\begin{table}[hbtp]
\centering
\begin{tabular}{|r|r|r|r|}
\hline
Just cones: & $0.9126$ & $0.8861$ & $0.7726$ \\ \hline
+ rods: & $0.9156$ & $0.8881$ & $0.7987$ \\ \hline
+ ipRGCs: & $0.9218$ & $0.8867$ & $0.8055$ \\ \hline
+ rods + ipRGCs: & $0.9326$ & $0.8933$ &$0.8100$ \\ \hline
\end{tabular} % Would be nice to format to fit page width
\caption{Correlation coefficients for various conditions incorporating additional signals.}
\label{tab:plusres}
\end{table}

Further, it is not clear to what extent the gains exhibited in Table \ref{tab:plusres} are due to noise within the computations; the randomly generated combinations are set via a random number generator which is re-set each time the script is run for reproducibility, however there is nothing to stop the randomly generated values for the additional input runs performing better purely by chance alone. In order to investigate this, the above extensions were re-run 100 times each, and the top performance for each skimmed and saved. The results of this are presented in Figure \ref{fig:relcontributions}. From this figure it can be seen that there is a real and clear benefit from the inclusion of the additional signals, and from inclusion of \emph{both} of the additional signals.

\begin{figure}[htbp]
\includegraphics[max width=\textwidth]{figs/LargeSphere/relcontributions.pdf}
\caption{Raincloud plot \cite{allen_raincloud_2019} showing the results of re-running the extended analyses 100 times and skimming the best performer from each. Red points and probability density function represent `just cones', green - `+ rods', blue - `+ \glspl{ipRGC}', black - `+ rods + \glspl{ipRGC}'.}
\label{fig:relcontributions}
\end{figure}

\subsubsection{Spectrum-based Analysis Discussion}

There is a clear distinction between the simple simulated response functions, and the recorded data. It is interesting that a simple linear recombination can improve the correlation so greatly. It should be remembered however that this type of post-hoc fitting is liable to delivering whatever results a researcher might hope to find. 

Taken at face value, the results suggest that the principal drivers of adaptation are not at the cone level, but rather at a higher level, once cone inputs have been combined. The results mirror what might be expected of these higher level signals - there appears to be a single signal for L and M, roughly mirrored between the two, with a reciprocal trade-off possible between L and M for both, and the S cone signal takes positive weights for S, and negative weights for both L and M, but with a curious hint of a reciprocal trade-off between S and M.

However, it is unclear to what extent these relationships may arise due to limitations of the experimental set-up; it is possible that a rise in one signal is yoked through hardware limitation to the rise or fall in another. Future investigators should consider whether this effect is modellable.

Though there is a demonstrated ability of additional signals to improve the correlation with the real data, it would be a leap to consider this as evidence for the existence of mechanisms operating in this manner.

It is likely that any additional signal at a different wavelength (or even with a different frequency component) would have been able to deliver a higher correlation, since the data is noisy and through the random recombination we are functionally allowing every option to be tested. This is highly likely to result in over-fitting. One way to test whether this has occurred is to plot the contributions of the top performers from these situations (analogous to Figures \ref{fig:contributions_3} and \ref{fig:contributions_all}) and consider the apparent trends. This is plotted in Figure \ref{fig:contributions_5}. It can be seen that there do appear to be some trends for both rods and \glspl{ipRGC}. Splitting this apart, into Figures \ref{fig:contributions_4} and \ref{fig:contributions_5minusrods}, where we plot the results for simulations run where only the rods were added or only the \glspl{ipRGC} were added, we can see these trends slightly more clearly.

\begin{figure}[htbp]
\includegraphics[max width=0.9\textwidth]{figs/LargeSphere/contributions_5.pdf}
\caption{As per Figure \ref{fig:contributions_3} but for the conditions where both rods and ipRGCs were included as additional input signals were considered.} 
\label{fig:contributions_5}
\end{figure}

\begin{figure}[htbp]
\includegraphics[max width=0.9\textwidth]{figs/LargeSphere/contributions_4.pdf}
\caption{As per Figure \ref{fig:contributions_3} but for the conditions where rods alone were included as additional input signals.} 
\label{fig:contributions_4}
\end{figure}

\begin{figure}[htbp]
\includegraphics[max width=0.9\textwidth]{figs/LargeSphere/contributions_5minusrods.pdf}
\caption{As per Figure \ref{fig:contributions_3} but for the conditions where ipRGCs alone were included as additional input signals.} 
\label{fig:contributions_5minusrods}
\end{figure}

Figures \ref{fig:contributions_4} and \ref{fig:contributions_5minusrods} appear somewhat similar. Considering the similar spectral characteristics of rods and melanopsin, it should be expected that the model would use the signals in somewhat similar fashions. It appears as though neither has a particularly strong role to play in the L and M adaptation, however both seem to reliably be used as a negative weighting for S. It is unclear whether this is simply a response to a single high datapoint in this dataset (the peak at 460nm) or whether this integration serves a broader purpose. This peak at 460nm also seems to be the cause for the stubbornly low correlation coefficients for the S channel (compared to L and M), which max out at around 0.81 (see Figure \ref{fig:relcontributions}).

\section{Conclusion}
\label{sec:LSdis}

Two methods of analysis for the \citet{macdonald_chromatic_2013} data are presented here.

The chromaticity-based analysis showed that there was a strong correspondence between the patterns of the chromaticities of the adapting fields and the pattern of responses. There were a small number of outliers, which could plausibly be due to additional inputs to the adaptive process, however these outliers were in line with the amount of noise in the data, and confounded by many other variables and sources of noise. This analysis provides no basis for rejecting the null hypothesis that cones and rods are the sole responsible agents in adaptation for the studied retinal locations and conditions.

The spectrum-based analysis showed that the results for one observer could not be well fitted by a simple model of Von-Kries-type observer, but that simple linear combinations of the responses of such an observer could be made to fit the recorded data very well. Though the ability of these fits is to be expected from this type of post-hoc fitting, the types of models which are predicted by the fitting align well with our understanding of post-receptoral signals, which suggests that we may be measuring adaptation at these levels. It would be particularly valuable to see whether the temporal nature of responses also aligned with our understanding of the timecourses of these different signals. It should be noted that this result could be due to the nature of the experimental set-up; observers were unable to modulate the cone activations directly. For example, to make the stimulus more green, the observer would inherently have to make it less red. It is plausible that this may account for the relationships seen in this analysis.

The second analysis further showed that integration of the simulated rod and melanopic signals provides additional value in fitting the data. Again, this is to be expected from this type of fitting. The minimal improvement delivered by the inclusion of rod and melanopic signals does not furnish me with strong enough evidence to reject the null hypothesis that chromatic adaptation can be fully accounted for by cone and rod mechanisms. 

\subsection{Limitations}

Several limitations have been identified with this experimental design which should be borne in mind when analysing this dataset, or planning similar experiments.

\begin{itemize}
\item The light levels in this experiment were in the mesopic range, and it is unclear whether we might expect to see melanopsin activation, and thus any melanopic interaction to adaptation, at these levels.
\item The light levels were different for each condition. Presumably a stronger adapting illumination might have a stronger adaptive effect, but it is unclear what how this might manifest (faster? more chromatic neutral point?) and what the underlying relationship may be. It would be tempting to match the adapting surrounds for luminance, or radiant power, but neither of these solutions provides genuine equality across conditions (when matching for luminance almost no level of 400nm radiation would be able to match the luminance at other wavelengths, and there is no reason why radiant power should directly translate to adaptive effect).
\item This experimental paradigm requires the assumption that adapting one part of the retina (the periphery) has an effect upon the adaptive state of another part (the fovea). This assumption is implicit in all chromatic adaptation models, but to some extent we know this to be incorrect - consider for example spatially locked after-images. See also the previous work of \citet{macadam_chromatic_1956} where two halves of the retina were explicitly adapted differently.
\item This method provides very noisy data, and it is difficult to average over any of the nominal `repeat' conditions since every variable seems to have a non-negligible effect, and none of these effects appear to be easily modellable. The inherent noise in this data collection methods places heavy limits on the scale of detectable effect sizes.
\item It is difficult to distinguish effects that indicate a biological basis and those which arise due to experimental limitations. For example, in the spectrum-based analysis it was unclear whether an increase in one sensor activation was required to make a match, or whether it was simply yoked to a decrease in another by the restraints placed upon the response space.
\item It appears that there is a strong correlation between an observer's match and the previous match. It is unclear whether this is due to the fact that the new `random' starting condition is centred upon the previous selection, or whether this is a foveal adaptive effect.
\item The spectral resolution of 20nm is relatively low for analyses such as the spectrum based analysis. However, increasing the resolution would probably be an unrealistic goal, considering the amount of required observer time.
\end{itemize}


\subsection{Further Work}

This dataset may only comprise data from two observers, but it is broad and may be valuable to those interested in chromatic adaptation and colour constancy. Various further analyses are envisioned:
\begin{itemize}
    \item As previously mentioned, there would likely be some value in the further modelling of the potential response space of an observer on this task, and the interaction the types of analysis performed here.
    \item It would be interesting to consider non-linear adaptation responses. This may better account for the data at the extremes of the wavelength range where luminance was very low.
    \item The temporal dimension of the data is not considered in either of the analyses presented here, but the code for the second analysis has been written in such a way that it should be relatively easy to implement. Considering the different expected time courses for adaptation for cones/rods/\glspl{ipRGC} this may serve to be a fruitful avenue.
    \item It would be valuable to collect data for more observers, ideally under conditions where the white point of the display was matched between observers, in order to understand what effects are robust across observers.
    \item If the L* dependent effects could be accounted for, then averaging over the entire range of L* could be implemented, which should reduce the level of noise, and improve the ability of an investigator to draw conclusions from a chromaticity-based analysis.
    \item For the spectrum-based analysis, currently only a single colorimetric observer is used (Stockman-Sharpe 10deg). It would be possible, and more correct to use specific observers relating to actual ages and visual fields. It may also be possible to use sharpened spectral sensitivities (see \citet{finlayson_spectral_1994}), though this would need to be done particularly carefully, considering the already large potential for overfitting.
\end{itemize}

\section{Interim Summary}

The experiment reported in this chapter aimed to extend our understanding of colour constancy and chromatic adaptation, specifically asking whether there was a melanopic influence. No clear effect for a melanopic influence was found, though the absence of an effect could not be authoritatively confirmed.

A large number of limitations were identified, and it was deemed appropriate to develop a second version of this experimental set-up, and perform a further experiment. This further experiment is reported in the following chapter.