main_report.Rmd

---
title: "Empathy, Emotional Intelligence, and Deception Detection -- Main Report"
author: "Timothy J. Luke"
date: "`r Sys.Date()`"
output: 
  html_document:
    toc: true
    toc_float: true
    keep_md: true
knit: (function(input_file, encoding) {
    rmarkdown::render(input_file, encoding = encoding, output_dir = "./reports/")
  })
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

source("./R/wrangle.R")
source("./R/hypothesis_testing.R")
source("./R/signal_detection.R")
```

# Background

The main author of this report (Timothy) reviewed a manuscript for _Applied Cognitive Psychology_, which was ultimately rejected for publication. In that review, I recommended several potential ways of improving the statistical analyses and the interpretations thereof. I signed the review. Some time after the manuscript was rejected, the main author contacted me and asked if I would like to reanalyze the data and implement some of the things I recommended. I said yes.

This report contains my reanalysis of the data, as well as expansions of my initial reanalysis, in response to reviewer comments.

The report is constructed by calling a series of scripts containing the code to wrangle the data, fit the models, and draw the visualizations presented below.

# Initial examination of the data

## Distributions

First, we examined the distributions of the variables that would be used as predictors in the models testing the hypotheses.

Although the data distributions for the EI measure and trait empathy were certainly not normal, we don't see anything here that would make me very concerned about using it in a model.

```{r}
predictor_hist_ei
```

```{r}
predictor_hist_emp
```

## Correlations between predictors

```{r}
h1_predictor_cor
```

## Missing data

Thankfully, there is very little missing data. Just `r acc_na` missing accuracy value and `r conf_na` confidence values.

```{r}
acc_na # accuracy
conf_na # confidence
```

# Hypothesis 1

To test the hypothesis that emotional intelligence and trait empathy predict higher deception detection accuracy, we fit a series of mixed effects logistic regression models. we then compared the models using likelihood ratio tests, to select a preferred model.

We wrangled the data into long form, such that each row represented a judgment by a receiver. Here are the first 20 rows, to illustrate the data structure.

Note that for the data used to test Hypothesis 1 and 2, I divided the emotional intelligence subscales and empathy subscales by 10 and mean centered them. These transformations were done to troubleshoot nearly unidentifiable models. The transformations should have no impact on the substantive interpretation of the models. To convert coefficients related to these variables back to the original scale, simply divide them by 10.

```{r}
head(model_data, 20) %>% 
  knitr::kable()
```

In the first model, deception detection accuracy (0 = incorrect, 1 = correct) was regressed on the veracity condition of the message (truth vs. lie), with random intercepts for senders, as well as random slopes (for veracity) and intercepts for receivers. In the second model, the four trait empathy subscales were added as predictors. In the third model, the four emotional intelligence subscales were added as predictors.

Likelihood ratio tests indicated at best marginal improvement by the addition of the empathy measures, but there was significant improvement of fit with the addition of the emotional intelligence measures.

```{r}
lrt_accuracy
```

Thus, we retained the third model, which included both trait empathy and emotional intelligence. The main output of this model is provided below:

```{r}
summary(model_ei)
```

Interestingly but perhaps unsurprisingly, the variance in intercepts for senders massively exceeds the variance in intercepts for receivers. The sender variance exceeds the receiver variance by a factor of more than 124. As in Bond and DePaulo (2008), accuracy is primarily determined by the sender, rather than by the receiver.

Examining the coefficients, it appears that the Perceiving subscale of emotional intelligence appears to positively predict deception detection accuracy. The Perspective Taking subscale of the trait empathy instrument also appears to predict increased accuracy, but the p-value is fairly high (i.e., it would likely not survive a reasonable correction for multiple comparisons). The Using subscale of EI appears to negatively predict accuracy, but again, the p-value is relatively high, and we are not confident that this would survive corrections for multiple comparisons. Moreover, the significance of this coefficient is not robust to alternative models (see the supplemental analyses below).

To visualize the increase in accuracy apparently conferred by higher scores on the Perceiving EI subscale, we extracted the mean predicted accuracy rates from the preferred model at each level of the Perceiving subscale. The figure below illustrates the relationship between accuracy and the Perceiving subscale, with a horizontal line drawn at chance-level accuracy and a vertical line drawn at the sample mean on Perceiving.

The increase in accuracy might be theoretically interesting, if this effect is trustworthy, but it is not particularly impressive from a practical perspective. People with the highest score on Perceiving are predicted to have a mean accuracy of `r paste(round(max(predict_plot_data$prob_mean) * 100, 1), "%", sep = "")`. Interestingly, in this sample, participants scoring near the mean on Perceiving are predicted to have approximately chance-level accuracy.

```{r}
predict_plot
```

Below is a similar plot illustrating how mean predicted accuracy varies as a function of the Perspective Taking subscale on the trait empathy measure. A participant with the highest Perspective Taking score in the sample is predicted to have a mean accuracy of `r paste(round(max(predict_plot_data_pt$prob_mean_pt) * 100, 1), "%", sep = "")`.

```{r}
predict_plot_pt
```

## Exploration of Sender and Receiver Intercepts

The random intercept variance for senders was quite large, and the variance for receivers was quite small. Thus, it may be worthwhile to describe that variance in more detail.

To examine this variation, we fit a logistic regression model predicting accuracy only using random intercepts for senders and receivers.There were 14 sender videos used in this experiment, and their estimated average accuracy rates (derived by converting the random intercepts into the response scale) are as follows:

```{r}
sender_intercepts
```

```{r}
sender_int_range
sender_int_mean
sender_int_sd
```

The intercepts for receivers are as follows:

```{r}
receiver_intercepts
```
```{r}
receiver_int_range
receiver_int_mean 
receiver_int_sd   
```

It is easy to see there is a great deal more variability in senders than in receivers.

## Do the Effects of Perceiving and Perspective Taking Vary Across Senders?

It is reasonable to ask whether the Perceiving subscale and the Perspective Taking subscale had differential effects across senders, given that there is so much variability in senders. Such differences can be modeled as random slopes for Perceiving and Perspective Taking for senders. These random slopes represent variation in the extent to which Perceiving and Perspective Taking render some senders more transparent. These slopes should vary considerably if, for example, senders varied substantively in the extent to which there were "hidden" emotional information in their messages, such that the detection of such emotional information (facilitated by Perceiving or Perspective Taking) would assist a receiver in accurately judging the message.

```{r}
sender_slope_lrt
```

```{r}
summary(model_ei_sender_slopes)
```

We see here there is virtually no variation in these random slopes. This lack of variation does not help clarify the mechanism by which Perceiving and Perspective Taking increase judgment accuracy. In fact, it raises more questions than it answers.

## Signal Detection Theory approach

From the above analyses, we cannot safely assume that the increases in accuracy associated with EI and empathy are not due to changes in bias, rather than changes in discrimination. For that reason, we calculated signal detection indices for each receiver. Specifically, we examined d-prime (a measure of discrimination) and c (a measure of bias). For each index, we fit a series of linear regressions predicting the index, first adding the empathy subscales as predictors and then adding the EI subscales as predictors. we compared the models using an F-test, to select a preferred model.

```{r}
sdt_data %>% 
  summarise(
    mean_dprime   = mean(dprime),
    sd_dprime     = sd(dprime),
    median_dprime = median(dprime),
    mean_c        = mean(c),
    sd_c          = sd(c),
    median_c      = median(c)
  )
```

For d-prime, the model comparison suggested that the model adding the EI subscales significantly improved the fit.

```{r}
lrt_sdt_dprime
```

This model effectively replicates the results from the logistic regression of the raw accuracy reported above. We see the same pattern of significant coefficients, in the same directions.

These results address some of the concerns raised in Timothy's review -- specifically that the results might be due to a change in bias rather than a change in discrimination.

```{r}
summary(model_ei_sdt)
```

Indeed, if we examine the models predicting bias (measured by c), we see that adding the EI measures does not improve the fit of the model, and the model that includes the empathy measures suggests no significant differences in bias. Moreover, the adjusted R-squared is negative, suggesting very poor fit with the data.

```{r}
lrt_sdt_bias
```

```{r}
summary(model_sdt_c_emp)
```

In short, the signal detection approach supports the results from the raw accuracy models.

### GLMM approach to SDT

It is also possible to fit generalized linear (mixed) models to perform signal detection analysis (see DeCarlo, 1998, https://doi.org/10.1037/1082-989X.3.2.186). There are many ways to implement this approach. Here, we fit a series of probit models predicting participants' responses using the message's actual veracity. The intercept represents -1*c (bias), and the coefficient for actual veracity represents d-prime. Coefficients for the predictors represent the influence of those predictors on response bias (c). Interaction terms between veracity and the predictors of interest represent the effect of those predictors on d-prime. Following an approach analogous to the main analysis above, we fit a series of models and compared them with likelihood ratio tests.

```{r}
lrt_sdtglm
```

```{r}
summary(sdtglm_ei)
```

As can be seen, the results of this approach are highly similar to the logistic regression approach predicting accuracy and highly similar to the classical SDT approach. Again, we see that the clearest predictor of discrimination is the Perceiving subscale of the MSCEIT. 

However, unlike in previous analyses, we can see here that the Perceiving subscale of the MSCEIT also predicts response bias, such that those higher on Perceiving are more truth biased. This result does not appear in the classical SDT approach above.

# Hypothesis 2

Next, we turned our attention to the question of whether emotional intelligence and trait empathy give receivers better accuracy-confidence calibration. This calibration can be represented as a coefficient for confidence predicting accuracy (at the judgment level), and an increase in calibration would be represented by an interaction term for confidence and either EI or empathy.

To investigate this question, we fit a series of mixed effects logistic regression models, similar to the ones above. In the initial model, confidence and veracity predicted accuracy. As can be seen below, in this initial model, there is no relationship between accuracy and confidence. The coefficient is very close to 0.

```{r}
summary(model_conf_base)
```

Based on the models testing Hypothesis 1, we regarded the Perceiving EI subscale and the Perspective Taking empathy subscale as the best candidates to interact with confidence. Thus, we fit models in which (1) those scales were added and (2) their interaction terms with confidence were added. I then compared these models to the base model using a likelihood ratio test, to select a model. The model comparison suggested a significant improvement of fit adding EI and empathy, but there was no improvement of fit with the addition of the interaction terms.

```{r}
lrt_confidence
```

Since the interaction terms did not improve the model fit, there does not appear to be any evidence that emotional intelligence and empathy improve confidence-accuracy calibration. In the selected model (output below), the Perceiving EI subscale significantly predicts accuracy as above, but the coefficient for Perspective Taking is no longer significant. These results reduce our confidence that Perspective Taking really predicts accuracy.

```{r}
summary(model_conf_ei)
```

## Signal Detection Approach (GLMM)

It is also possible to assess the confidence-accuracy relationship in an SDT framework, following the same approach as with the above SDT analyses. Here, we fit three models analogous to the confidence-accuracy models above.

```{r}
lrt_sdtglm_conf
```

```{r}
summary(sdtglm_conf_emp)
```

Once again, we see that the SDT approach yields basically the same results as the raw accuracy approach. Confidence does not appear to predict truth-lie discrimination (as measured by d-prime). Indeed, the estimated relationship (i.e., the interaction term between confidence and veracity) is very close to zero.

# Hypothesis 3

The third research question concerned whether emotional intelligence and trait empathy predicted the judgment criteria provided by the receivers. Receivers' listed criteria were coded into four categories: cognitive complexity, emotional features, expressive indices, and paraverbal aspects.

For each of these categories, we fit and compared a series of linear mixed effects models. In the first model, the number of criteria listed was predicted by veracity, with random intercepts and slopes for each receiver and random intercepts for each sender. In the second model, the four empathy subscales were added as predictors. In the third model, the four emotional intelligence subscales were added as predictors. We compared each of these models using likelihood ratio tests to find a preferred model.

## Cognitive Complexity

The model selection procedure suggested that neither empathy nor EI predicted receivers reporting cognitive complexity criteria.

```{r}
cognitive_models[[4]]
```

```{r}
summary(cognitive_models[[1]])
```

## Emotional Features

The model selection procedure for emotional features suggested that the addition of the empathy subscales improved fit, but the addition of the emotional intelligence subscales did not.

```{r}
emotion_models[[4]]
```

The IRI Fantasy subscale significantly predicts receivers reporting emotional features as judgment criteria. The effect is relatively small, however. A change of 1 point on the Fantasy subscale corresponds to a predicted increase of reporting `r round(summary(emotion_models[[2]])$coefficients[5, 1], 3)` of an emotional feature criterion.

```{r}
summary(emotion_models[[2]])
```

## Expressive Indices

The model selection procedure suggested that neither empathy nor EI predicted receivers reporting expressive indices.

```{r}
expressive_models[[4]]
```

```{r}
summary(expressive_models[[1]])
```

## Paraverbal Aspects

The model selection procedure suggested that neither empathy nor EI predicted receivers reporting paraverbal aspects.

```{r}
paraverbal_models[[4]]
```

```{r}
summary(paraverbal_models[[1]])
```

# Additional materials

## Discarded models for Hypothesis 1

The initial model, with only veracity as a predictor.

```{r}
summary(model_base)
```

The model adding the trait empathy measures.

```{r}
summary(model_emp)
```

## Discarded models for Hypothesis 2

Adding the interaction terms did not improve the fit of the model.

```{r}
summary(model_conf_ei_int)
```

## Robustness checks for Hypothesis 1

The effect for EI Perceiving persists when removing the empathy variables.

```{r}
summary(model_ei_red)
```

The coefficient for EI Perceiving remains significant when removing other predictors.

```{r}
summary(model_ei_per)
```
The negative coefficient for EI Using becomes nonsignificant when other predictors are removed.

```{r}
summary(model_ei_use)
```

Additionally, EI total scores and EI bifactor scores do not predict accuracy.

```{r}
summary(model_ei_sub)
```

```{r}
summary(model_ei_tot)
```

# Exploratory analyses

A reviewer requested an analysis of gender differences. 

Descriptively, it does seem as though women tend to score slightly higher on the
EI and empathy measures than men, but these tendencies are quite small. For the
empathy measures, three out of the four differences between men and women were 
statistically significant, for what it is worth.

```{r, echo = FALSE}
emotion %>% 
    group_by(sex) %>% 
    summarise(
        mean_pt = mean(iri_pt, na.rm = TRUE),
        sd_pt   = sd(iri_pt, na.rm = TRUE),
        mean_ec = mean(iri_ec, na.rm = TRUE),
        sd_ec   = sd(iri_ec, na.rm = TRUE),
        mean_fs = mean(iri_fs, na.rm = TRUE),
        sd_fs   = sd(iri_fs, na.rm = TRUE),
        mean_pd = mean(iri_pd, na.rm = TRUE),
        sd_pd   = sd(iri_pd, na.rm = TRUE)
    )
```

```{r}
emotion %>% 
    group_by(sex) %>% 
    summarise(
        mean_r1g = mean(r1g_st, na.rm = TRUE),
        sd_r1g   = sd(r1g_st, na.rm = TRUE),
        mean_r2g = mean(r2g_st, na.rm = TRUE),
        sd_r2g   = sd(r2g_st, na.rm = TRUE),
        mean_r3g = mean(r3g_st, na.rm = TRUE),
        sd_r3g   = sd(r3g_st, na.rm = TRUE),
        mean_r4g = mean(r4g_st, na.rm = TRUE),
        sd_r4g   = sd(r4g_st, na.rm = TRUE)
    )
```

There are many ways of approaching the question of whether accuracy varies as a 
function of gender. Here, we take the approach of fitting a simple model using 
only gender and veracity as fixed effects, with the same random effects 
structure as above, followed by a model that uses all the EI and empathy 
predictors as well.

```{r}
summary(gender_mod_1)
```

```{r}
summary(gender_mod_2)
```

As can be seen in these results, neither model supports the notiion of gender
differences in accuracy.