College Football Game Predictions

machine learning that predicts the outcome of any Division I college football game. Data are from 2015 - 2024 seasons. My DNN has an accuracy of 84% on the validation data. I use multi-class learning to use the prior week's feature data to predict next week's feature data. I used multiple models to achieve this. Data are from SportsReference and CFBD. The .pkl files I could not upload to GitHub due to size issues.

THE BIG PROBLEM: -this is trained on every college football teams. The models have bias from the bad teams, thus every top 25 team it assums that they are just going to win every game, as their feature values are substantially higher than the bad teams. Therefore, when a bad team wins with "low" feature scores, and a top 25 team wins with "high" feature scores, the model assumes then that if the good team for a week plays bad, they will still win.

Usage

python3 collect_augment_data.py #update the data for 2024 every week
python3 deep_learning_multiclass.py test #evaluate the model on the "test" data, which is the top 25 teams last week's outcomes
python3 deep_learning_multiclass.py notest #Predict the outcomes between two teams

Current prediction accuracies

Outputs

example out from the results.txt

==============================
Win Probabilities from Monte Carlo Simulation with 10000 simulations
louisiana-lafayette : 99.999 %buffalo : 0.001 %
Win Probabilities from rolling median of 2 predictions
louisiana-lafayette : 100.0 %buffalo : 0.0 %
Win Probabilities from rolling median of 3 predictions
louisiana-lafayette : 99.99 %buffalo : 0.01 %
Win Probabilities from exponential weighted average of 2 predictions
louisiana-lafayette : 100.0 %buffalo : 0.0 %
Win Probabilities from 25th and 75th percentile rolling 2
25th: louisiana-lafayette : 100.0 %buffalo : 0.0 %
75th: louisiana-lafayette : 99.999 %buffalo : 0.001 %
==============================

My Simple Rating System

I created a simple rating system that uses the median point differential of one team and subtracts that value by all median point differential of all other teams they played against. The only caveat is that I only count the point differentials of the other teams when they lose, as I want to be able to assess how much each team is losing by. The theory behind this is that good teams will not lose often and when they do it will most likely be by small margins. The higher the srs value, the "better" the team is.

Creating CFBD api key

got to CFBD and enter your email address. they will send you an API key. Create a file called api_key.yaml and store the API key like this:

api_key:
  Authorization: asdifnasdofnasdpvnapionmfaspidfasodnfkajdslmalskdm

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
multiclass_models		multiclass_models
predictions		predictions
processing_models		processing_models
.gitignore		.gitignore
README.md		README.md
Training.png		Training.png
all_data.csv		all_data.csv
all_data_2024.csv		all_data_2024.csv
all_data_test.csv		all_data_test.csv
all_histograms.png		all_histograms.png
all_schools.csv		all_schools.csv
collect_augment_data.py		collect_augment_data.py
collect_data.py		collect_data.py
data.npy		data.npy
deep_learning_multiclass.py		deep_learning_multiclass.py
deep_learning_multiclass_old.py		deep_learning_multiclass_old.py
environment.yaml		environment.yaml
fix_team_names.py		fix_team_names.py
histogram_example_used_for_fitting.png		histogram_example_used_for_fitting.png
histogram_example_used_for_fitting_small_sample_size.png		histogram_example_used_for_fitting_small_sample_size.png
hyperparameters.txt		hyperparameters.txt
k_pca_data.npy		k_pca_data.npy
kernel_pca_model.joblib		kernel_pca_model.joblib
my_srs.png		my_srs.png
num_features.txt		num_features.txt
proportions_test.txt		proportions_test.txt
results.csv		results.csv
run_until_empty.sh		run_until_empty.sh
simple_rating_system.py		simple_rating_system.py
team_rankings_year.yaml		team_rankings_year.yaml
team_rankings_year_top_40.yaml		team_rankings_year_top_40.yaml
teams_played_this_week copy.txt		teams_played_this_week copy.txt
teams_played_this_week.txt		teams_played_this_week.txt
top_30_teams.txt		top_30_teams.txt
x_feature_regression.csv		x_feature_regression.csv
x_feature_regression_2024.csv		x_feature_regression_2024.csv
y_feature_regression.csv		y_feature_regression.csv
y_feature_regression_2024.csv		y_feature_regression_2024.csv
year_count.yaml		year_count.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

College Football Game Predictions

Usage

Current prediction accuracies

Outputs

My Simple Rating System

Creating CFBD api key

Contributing

About

Releases

Packages

Languages

bszek213/deepCFB

Folders and files

Latest commit

History

Repository files navigation

College Football Game Predictions

Usage

Current prediction accuracies

Outputs

My Simple Rating System

Creating CFBD api key

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages