generated from iai-group/template-project
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Scripts and results for data analysis
- Loading branch information
Showing
13 changed files
with
560 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# Results of Quantitative Analysis | ||
|
||
The following sections provides the results reported in the paper along with the scripts used to generate the numbers. | ||
|
||
## One-way ANOVA | ||
|
||
![alt text](oneway-anova-table.png) | ||
|
||
The results are generated using [this script](../../scripts/data_analysis/anova.py) and the following command: | ||
|
||
`` python -m scripts.data_analysis.anova --type one-way `` | ||
|
||
## Two-way ANOVA | ||
|
||
![alt text](twoway-anova-table.png) | ||
|
||
The results are generated using [this script](../../scripts/data_analysis/anova.py) and the following command: | ||
|
||
`` python -m scripts.data_analysis.anova --type two-way `` | ||
|
||
## Mean scores | ||
|
||
Mean scores for response usefulness and explanation user ratings for different quality of the explanations and presentation mode can be generated using [this script](../../scripts/data_analysis/mean_scores.py). | ||
|
||
### Mean response usefulness scores for explanations with different levels of accuracy | ||
|
||
![](mean_scores/means_usefulness_explanation_quality.png) | ||
|
||
### Mean response usefulness scores for explanations with different presentation modes | ||
|
||
![](mean_scores/means_usefulness_explanation_presentation.png) | ||
|
||
### Mean explanation ratings for explanations with different levels of accuracy | ||
|
||
![](mean_scores/means_explanation_ratings_explanation_quality.png) | ||
|
||
### Mean explanation ratings for explanations with different presentation modes | ||
|
||
![](mean_scores/means_explanation_ratings_explanation_presentation.png) | ||
|
||
Mean scores for other response dimensions for different quality of the explanations and presentation mode can be generated using .... | ||
|
||
## Data distribution | ||
|
||
The distribution of user-judged response dimensions per query for both user studies can be generated using [this script](../../scripts/data_analysis/data_distribution.py) and the following command: | ||
|
||
`` python -m scripts.data_analysis.data_distribution `` | ||
|
||
![](data_distribution.png) | ||
|
||
## Demographic information | ||
|
||
Demographic information about the crowd workers participating in the user study is presented below: | ||
|
||
| Demographic Information | Option | Number of workers | | ||
| --- | --- | --- | | ||
| age | 18-30 | 39 | | ||
| age | 31-45 | 76 | | ||
| age | 46-60 | 41 | | ||
| age | 60+ | 4 | | ||
| age | Prefer not to say | 0 | | ||
| education | High School | 12 | | ||
| education | Bachelor's Degree | 111 | | ||
| education | Master's Degree | 34 | | ||
| education | Ph.D. or higher | 3 | | ||
| education | Prefer not to say | 0 | | ||
| gender | Male | 95 | | ||
| gender | Female | 60 | | ||
| gender | Other | 5 | | ||
| gender | Prefer not to say | 0 | |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+18.9 KB
...ive_analysis/mean_scores/means_explanation_ratings_explanation_presentation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+18.7 KB
...titative_analysis/mean_scores/means_explanation_ratings_explanation_quality.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+25.2 KB
...quantitative_analysis/mean_scores/means_usefulness_explanation_presentation.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+23.1 KB
results/quantitative_analysis/mean_scores/means_usefulness_explanation_quality.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
import collections | ||
from typing import List | ||
|
||
import matplotlib.pyplot as plt | ||
import pandas as pd | ||
|
||
|
||
def process_data_for_distribution_plot( | ||
response_dimensions: List[str], data_df: pd.DataFrame, main_feature: str | ||
): | ||
"""Processes the data for the distribution plot. | ||
Args: | ||
response_dimensions: The response dimensions used in a user study. | ||
data_df: Dataframe containing the results from a user study. | ||
Returns: | ||
A dictionary containing the data for the distribution plot. | ||
""" | ||
values_per_query = collections.defaultdict(dict) | ||
|
||
for value in list(set(list(data_df[main_feature]))): | ||
for response_dimension in response_dimensions: | ||
query_sub_df = data_df[data_df[main_feature] == value] | ||
query_response_dimension_values = list( | ||
query_sub_df[response_dimension] | ||
) | ||
values_per_query[value][ | ||
response_dimension | ||
] = query_response_dimension_values | ||
|
||
return values_per_query | ||
|
||
|
||
if __name__ == "__main__": | ||
aggregated_data = pd.read_csv( | ||
"results/user_study_output/output_processed_aggregated.csv" | ||
) | ||
|
||
for feature in [ | ||
"source_usefulness", | ||
"warning_usefulness", | ||
"confidence_usefulness", | ||
"conversational_frequency", | ||
"voice_frequency", | ||
]: | ||
f = [ | ||
int(d.replace("option_", "")) | ||
for d in list(aggregated_data[feature]) | ||
] | ||
aggregated_data[feature] = f | ||
|
||
response_dimensions = [ | ||
"familiarity", | ||
"interest", | ||
"search_prob", | ||
"relevance", | ||
"correctness", | ||
"completeness", | ||
"comprehensiveness", | ||
"conciseness", | ||
"serendipity", | ||
"coherence", | ||
"factuality", | ||
"fairness", | ||
"readability", | ||
"satisfaction", | ||
"usefulness", | ||
] | ||
|
||
feature = "questions_ids" | ||
values_per_query = process_data_for_distribution_plot( | ||
response_dimensions, aggregated_data, feature | ||
) | ||
|
||
n_col = 5 | ||
fig, axs = plt.subplots(3, n_col, figsize=(15, 9)) | ||
|
||
for id, response_dimension in enumerate(response_dimensions): | ||
boxplot_data = [] | ||
|
||
for value in range(1, 11): | ||
boxplot_data.append( | ||
values_per_query[value][response_dimension], | ||
) | ||
|
||
axs[int(id / n_col)][id % n_col].boxplot(boxplot_data) | ||
axs[int(id / n_col)][id % n_col].set_xlabel("Query ID") | ||
axs[int(id / n_col)][id % n_col].set_ylabel( | ||
"Worker Self-Reported Rating" | ||
) | ||
axs[int(id / n_col)][id % n_col].set_title( | ||
response_dimension.replace( | ||
"search_prob", "search probability" | ||
).title() | ||
) | ||
|
||
fig.tight_layout(pad=1.0) | ||
plt.figure(dpi=2000) | ||
fig.savefig("results/quantitative_analysis/data_distribution.png") |
Oops, something went wrong.