Scripts and results for data analysis

iai-group · Jan 30, 2024 · f90597f · f90597f
1 parent 44110e8
commit f90597f
Show file tree

Hide file tree

Showing 13 changed files with 560 additions and 1 deletion.
diff --git a/requirements.txt b/requirements.txt
@@ -4,4 +4,7 @@ pytest
 mypy
 docformatter
 pre-commit
-pydocstyle==6.1.1
+pydocstyle==6.1.1
+statsmodels>=0.12.2
+pandas==1.3.1
+matplotlib
diff --git a/results/quantitative_analysis/README.md b/results/quantitative_analysis/README.md
@@ -0,0 +1,70 @@
+# Results of Quantitative Analysis
+
+The following sections provides the results reported in the paper along with the scripts used to generate the numbers.
+
+## One-way ANOVA
+
+![alt text](oneway-anova-table.png)
+
+The results are generated using [this script](../../scripts/data_analysis/anova.py) and the following command:
+
+`` python -m scripts.data_analysis.anova --type one-way ``
+
+## Two-way ANOVA
+
+![alt text](twoway-anova-table.png)
+
+The results are generated using [this script](../../scripts/data_analysis/anova.py) and the following command:
+
+`` python -m scripts.data_analysis.anova --type two-way ``
+
+## Mean scores
+
+Mean scores for response usefulness and explanation user ratings for different quality of the explanations and presentation mode can be generated using [this script](../../scripts/data_analysis/mean_scores.py).
+
+###  Mean response usefulness scores for explanations with different levels of accuracy
+
+![](mean_scores/means_usefulness_explanation_quality.png)
+
+###  Mean response usefulness scores for explanations with different presentation modes
+
+![](mean_scores/means_usefulness_explanation_presentation.png)
+
+###  Mean explanation ratings for explanations with different levels of accuracy
+
+![](mean_scores/means_explanation_ratings_explanation_quality.png)
+
+###  Mean explanation ratings for explanations with different presentation modes
+
+![](mean_scores/means_explanation_ratings_explanation_presentation.png)
+
+Mean scores for other response dimensions for different quality of the explanations and presentation mode can be generated using ....
+
+## Data distribution
+
+The distribution of user-judged response dimensions per query for both user studies can be generated using [this script](../../scripts/data_analysis/data_distribution.py) and the following command:
+
+`` python -m scripts.data_analysis.data_distribution ``
+
+![](data_distribution.png)
+
+## Demographic information
+
+Demographic information about the crowd workers participating in the user study is presented below:
+
+| Demographic Information | Option | Number of workers |
+| --- | --- | --- |
+| age | 18-30 | 39 |
+| age | 31-45 | 76 |
+| age | 46-60 | 41 |
+| age | 60+ | 4 |
+| age | Prefer not to say | 0 |
+| education | High School | 12 |
+| education | Bachelor's Degree | 111 |
+| education | Master's Degree | 34 |
+| education | Ph.D. or higher | 3 |
+| education | Prefer not to say | 0 |
+| gender | Male | 95 |
+| gender | Female | 60 |
+| gender | Other | 5 |
+| gender | Prefer not to say | 0 |
diff --git a/results/quantitative_analysis/data_distribution.png b/results/quantitative_analysis/data_distribution.png
diff --git a/...ive_analysis/mean_scores/means_explanation_ratings_explanation_presentation.png b/...ive_analysis/mean_scores/means_explanation_ratings_explanation_presentation.png
diff --git a/...titative_analysis/mean_scores/means_explanation_ratings_explanation_quality.png b/...titative_analysis/mean_scores/means_explanation_ratings_explanation_quality.png
diff --git a/...quantitative_analysis/mean_scores/means_usefulness_explanation_presentation.png b/...quantitative_analysis/mean_scores/means_usefulness_explanation_presentation.png
diff --git a/results/quantitative_analysis/mean_scores/means_usefulness_explanation_quality.png b/results/quantitative_analysis/mean_scores/means_usefulness_explanation_quality.png
diff --git a/results/quantitative_analysis/oneway-anova-table.png b/results/quantitative_analysis/oneway-anova-table.png
diff --git a/results/quantitative_analysis/twoway-anova-table.png b/results/quantitative_analysis/twoway-anova-table.png
diff --git a/scripts/__init__.py b/scripts/__init__.py
diff --git a/scripts/data_analysis/__init__.py b/scripts/data_analysis/__init__.py
diff --git a/scripts/data_analysis/data_distribution.py b/scripts/data_analysis/data_distribution.py
@@ -0,0 +1,100 @@
+import collections
+from typing import List
+
+import matplotlib.pyplot as plt
+import pandas as pd
+
+
+def process_data_for_distribution_plot(
+    response_dimensions: List[str], data_df: pd.DataFrame, main_feature: str
+):
+    """Processes the data for the distribution plot.
+
+    Args:
+        response_dimensions: The response dimensions used in a user study.
+        data_df: Dataframe containing the results from a user study.
+
+    Returns:
+        A dictionary containing the data for the distribution plot.
+    """
+    values_per_query = collections.defaultdict(dict)
+
+    for value in list(set(list(data_df[main_feature]))):
+        for response_dimension in response_dimensions:
+            query_sub_df = data_df[data_df[main_feature] == value]
+            query_response_dimension_values = list(
+                query_sub_df[response_dimension]
+            )
+            values_per_query[value][
+                response_dimension
+            ] = query_response_dimension_values
+
+    return values_per_query
+
+
+if __name__ == "__main__":
+    aggregated_data = pd.read_csv(
+        "results/user_study_output/output_processed_aggregated.csv"
+    )
+
+    for feature in [
+        "source_usefulness",
+        "warning_usefulness",
+        "confidence_usefulness",
+        "conversational_frequency",
+        "voice_frequency",
+    ]:
+        f = [
+            int(d.replace("option_", ""))
+            for d in list(aggregated_data[feature])
+        ]
+        aggregated_data[feature] = f
+
+    response_dimensions = [
+        "familiarity",
+        "interest",
+        "search_prob",
+        "relevance",
+        "correctness",
+        "completeness",
+        "comprehensiveness",
+        "conciseness",
+        "serendipity",
+        "coherence",
+        "factuality",
+        "fairness",
+        "readability",
+        "satisfaction",
+        "usefulness",
+    ]
+
+    feature = "questions_ids"
+    values_per_query = process_data_for_distribution_plot(
+        response_dimensions, aggregated_data, feature
+    )
+
+    n_col = 5
+    fig, axs = plt.subplots(3, n_col, figsize=(15, 9))
+
+    for id, response_dimension in enumerate(response_dimensions):
+        boxplot_data = []
+
+        for value in range(1, 11):
+            boxplot_data.append(
+                values_per_query[value][response_dimension],
+            )
+
+        axs[int(id / n_col)][id % n_col].boxplot(boxplot_data)
+        axs[int(id / n_col)][id % n_col].set_xlabel("Query ID")
+        axs[int(id / n_col)][id % n_col].set_ylabel(
+            "Worker Self-Reported Rating"
+        )
+        axs[int(id / n_col)][id % n_col].set_title(
+            response_dimension.replace(
+                "search_prob", "search probability"
+            ).title()
+        )
+
+    fig.tight_layout(pad=1.0)
+    plt.figure(dpi=2000)
+    fig.savefig("results/quantitative_analysis/data_distribution.png")