In this project, we develop a logistic regression model, dubbed the Highway model, to analytically identify the situations in which defensive play breaks down and in what situations does it successfully prevent shots by predicting the probability of a game state being a dangerous situation (i.e. the probability of a high-danger unblocked shot by the power play within the three passes following the configuration). We additionally build an actionable tool (https://highway-to-the-danger-zone.netlify.app/ | Github: https://github.com/nguyenank/bdc22-mst-website) that coaches and analysts can use to apply to their own teams and strategies to both minimize and maximize high-danger shot attempts. Read the full writeup in the paper Highway to the Danger Zone.pdf.
The final merged and cleaned dataset used is all_powerplays_4-23-22_cleaned_trimmed.csv. The 'pipeline' for running this project was the follwing:
- Run bdc_merge_example.ipynb to merge the data.
- Manually clean and calculate the distance to attacking net according the processes described in the paper.
- Run modelling_and_plotting.py.
- hockey_mst.py is used to outline functions for handing the minimum spanning tree calculations.
- feature_values.csv is the human readable list of coefficients used in the logistic regression model
- feature_dict.json is the machine readable JSON list of coefficients used in the logistic regression model
- tool_test_data folder contains the plots and variable values used to calibrate the webtool
- graphic_output contains model evaluation charts.
- high_danger_states and low_danger_states contain examples of game states from the dataset that had extremely either high or low probabilities of being a dangerous situation.
- Old Data contains previously used testing data