Skip to content


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation


Case Study: Overwatch Mystery Heroes Exploratory Data Analysis in R

Table of Contents

Executive Summary

  • Objectives: Observational study measuring uncertainty to develop a probability model for simple random sampling and determine testing for a difference in sampling distributions. Under the assumption of the null hypothesis, the difference of the sampling distribution of the sample means assumes the population variances are the same.
  • Key Findings: The unique dynamics of the Mystery Heroes game mode, where players are assigned random heroes, may introduce variability and potentially non-normality in the distributions of scores and outcomes. It is important to consider the impact of these dynamics. Generalizability of findings specific to Mystery Heroes may or may not be applied more broadly.
  • Conclusions:


  • Background:

    Overwatch 2 (2022) is a fast-paced, competitive, team-based, player versus player first-person shooter (FPS) game developed by Blizzard Entertainment. To be a proficient player demands patience, practice, and emotional control. It requires quick thinking, teamwork, communication, and coordination to ascend the rank of Grand Master. Overwatch categorizes heroes into three main, commonly known classes: Damage, Tank and Support; teams must rely on their heroes' special strengths and abilities and work together to secure objectives each round. Players are awarded on a point-based system, allowing them to track and compare in-game performance with others. Players can also view their post-game scores on a scoreboard located in the match history.

    Mystery Heroes is one of many unranked game modes in Overwatch that is played over several unique maps each having different scoring objectives. However, in Mystery Heroes, instead of chosing a hero class, players are randomly assigned heroes at the start each the round and subsequently after each death/respawn. In essence, this restricted game mode prevents players from selecting their favorite or "main" characters and classes. This creates an interesting, dynamic situation for a case study on the game's interplay mechanics, while studying the various distributional anomalies to find hidden, meaningful statistical observations, particularily, what it can tell us about Overwatch itself, the limitless combinations of team and individual play styles, skill levels, hero mastery, and much, much more.

  • Purpose:

    1. Classify and predict hero class.
    2. Classify and predict event outcome.
    3. Classify and predict player score.
  • Research Questions: List the research questions or hypotheses you are exploring.

    1. What is the relationship between skill level/rank and score?
    2. What is the relationship between result and score?
    3. What is the relationship between game_length and score?
  • Hypothesis Testing: Perform hypothesis testing.

    1. One-sided test: CI = 95%, $\alpha$ = 0.05
      $H_{0}$: Grand Master Elimination/Assist/Death Ratio = $\mu_{ead}$
      $H_{A}$: Grand Master Elimination/Assist/Death Ratio > $\mu_{ead}$
    2. One-sided test: CI = 95%, $\alpha$ = 0.05
      $H_{0}$: Grand Master Damage/Heal/Mitigation Ratio = $\mu_{dhm}$
      $H_{A}$: Grand Master Damage/Heal/Mitigation Ratio > $\mu_{dhm}$
    3. Two-sided test: CI = 95%, $\alpha_{/2}$ = 0.025
      $H_{0}$: Difference in mean score $\mu_{victory}$ - $\mu_{defeat}$ = 0
      $H_{A}$: Difference in mean score $\mu_{victory}$ - $\mu_{defeat}$ $\neq$ 0

Installation and Setup

Detailed instructions on how to set up the project. Include:

  • Requirements (e.g., R version, additional software).
  • Instructions to install R and RStudio, if necessary.
  • Steps to clone the repository and set up the environment.
  • Any required R packages and how to install them.
# Example commands to set up the environment


Detail the structure and explanation of the data:

Variable Description Type Format/Units
group_id match group id integer 1-inf+
map_name map name character
comp competitive mode factor yes/no
result match result factor victory/defeat
final_score match point integer A-B
game_mode game mode factor push/control/hybrid
game_length match duration datetime mm:ss
team 5 vs 5 factor A or B
elimination # eliminations integer 0-inf+
assist # assists integer 0-inf+
death # deaths integer 0-inf+
damage total damage integer 0-inf+
heal total heal integer 0-inf+
mitigation total mitigation integer 0-inf+
  • Session Info:
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)
  • Variables: Description of each variable in the data set.

  • Formats or scales used: Specify units of measurement, categorizations, or scales used.

    • Data Types:
      • game_mode: hybrid, payload, control
        • final_score: Attacker-Defender
      • game_mode: push
        • final_score: Maximum 1 point
  • Data Source: Mention the original source of the data.

  • Packages:

  • Libraries:

Data Collection Method

  • Data Source: Explain where and how you collected the data.
    • Data Storage: Local SSD
    • Data Format: *.png
    • Data Size: 4.18 GB
  • Data Collection Process: Describe the steps or methodology used in data collection.
    • Collection Method 1: Post game match scoreboard
    • Collection Method 2: In-game menu career history
  • Data Processing: Outline any processing or cleaning done on the data.
    • Pre-processing: Filter incomplete/missing/NA player scores and match point results


Instructions on how to run the analysis:

  • Steps to load the data into R Studio.
  • How to run analysis scripts.
  • Any necessary instructions for interpreting the results.
# Example code snippet

Results and Discussion

  • Findings: The study observes the sampling distribution is not normally distributed. Skewness and symmetry is important when discussing probability distributions because the majority of the data is located on the left side of the graph. Log transformation, a common method to handle skewness will help normalize data and reduce the effect of outliers.
  • Visualizations: Include plots or graphs with appropriate captions.

Correlation plot of match sampling distribution of the sample means elimination, assist, death by group_id, team with respect to result, n = 26

Correlation plot of match sampling distribution of the sample means damage, heal, mitigation by group_id, team with respect to result, n = 26

70/30 train test split using KNN (k=5) to classify and predict match results, n = 14

Leave One Out Cross Validation to test the training model for accuracy, n = 30

  • Interpretation: Discuss the implications or significance of the findings.

Statistical significance vs. practical significance

Effect size

Cohen's d


Guidelines for how others can contribute to your project. This might include:

  • Instructions for submitting issues or questions.
  • How to propose enhancements or fixes.


This project is currently unlicensed. All rights reserved. Please feel free to contact for permissions to use, modify, or distribute the code in this repository until a license is designated.


Provide contact information for further inquiries or collaboration.


mystery heroes exploratory data analysis







No releases published


No packages published