Skip to content

The Highway model, a logistic regression for predicting if game states in hockey are dangerous; for the Big Data Cup 2022.

Notifications You must be signed in to change notification settings

nguyenank/bdc22-mst

Repository files navigation

Highway to the Danger Zone 𝅗𝅥 𝅗𝅥 𝅗𝅥 𝅗𝅥 𝅘𝅥 ♫ 𝅘𝅥

In this project, we develop a logistic regression model, dubbed the Highway model, to analytically identify the situations in which defensive play breaks down and in what situations does it successfully prevent shots by predicting the probability of a game state being a dangerous situation (i.e. the probability of a high-danger unblocked shot by the power play within the three passes following the configuration). We additionally build an actionable tool (https://highway-to-the-danger-zone.netlify.app/ | Github: https://github.com/nguyenank/bdc22-mst-website) that coaches and analysts can use to apply to their own teams and strategies to both minimize and maximize high-danger shot attempts. Read the full writeup in the paper Highway to the Danger Zone.pdf.

The final merged and cleaned dataset used is all_powerplays_4-23-22_cleaned_trimmed.csv. The 'pipeline' for running this project was the follwing:

  1. Run bdc_merge_example.ipynb to merge the data.
  2. Manually clean and calculate the distance to attacking net according the processes described in the paper.
  3. Run modelling_and_plotting.py.

Other notes

  • hockey_mst.py is used to outline functions for handing the minimum spanning tree calculations.
  • feature_values.csv is the human readable list of coefficients used in the logistic regression model
  • feature_dict.json is the machine readable JSON list of coefficients used in the logistic regression model
  • tool_test_data folder contains the plots and variable values used to calibrate the webtool
  • graphic_output contains model evaluation charts.
  • high_danger_states and low_danger_states contain examples of game states from the dataset that had extremely either high or low probabilities of being a dangerous situation.
  • Old Data contains previously used testing data

About

The Highway model, a logistic regression for predicting if game states in hockey are dangerous; for the Big Data Cup 2022.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •