The objective of the project is to create a predictive model that predicts players for the All-NBA Team and All-Rookie Team based on statistical data.
All-NBA Team | Player 1 | Player 2 | Player 3 | Player 4 | Player 5 |
---|---|---|---|---|---|
First Team | Giannis Antetokounmpo | Luka Doncic | Jayson Tatum | Anthony Davis | Shai Gilgeous-Alexander |
Second Team | Anthony Edwards | Kevin Durant | LeBron James | Nikola Jokic | Paolo Banchero |
Third Team | Jalen Brunson | De'Aaron Fox | DeMar DeRozan | Domantas Sabonis | Devin Booker |
All-Rookie Team | Player 1 | Player 2 | Player 3 | Player 4 | Player 5 |
---|---|---|---|---|---|
First Team | Victor Wembanyama | Chet Holmgren | Brandon Miller | Keyonte George | Scoot Henderson |
Second Team | Jaime Jaquez Jr. | Amen Thompson | Brandin Podziemski | Cason Wallace | Ausar Thompson |
- Loading data from a CSV file
- Data preprocessing including handling missing values, standardization, and feature selection
- Modeling using Random Forest Classifier for player classification
- Evaluation of different models including Random Forest, Support Vector Regressor (SVR), and XGBoost
all_nba.ipynb
: Jupyter Notebook containing the data analysis process including data loading, preprocessing, modeling, and evaluation.
-
Number of All-NBA Nominations for Top 20 Players:
-
Feature Correlation Matrix:
-
Features Most Correlated with All-NBA Nomination:
- Presents the features (player statistics) most correlated with All-NBA nomination, helping identify key predictors.
-
Average Age of All-NBA Nominated Players in Each Season:
-
Teams with the Most Players in All-NBA:
- Displays teams with the highest number of players nominated for All-NBA, highlighting teams with significant impact.
-
Teams with the Most Players in All-NBA in Each Season:
- Shows teams with the highest number of players nominated for All-NBA in each season, indicating changes in dominant teams over time.
- A function check_files_exist checks if the required files are present in the specified directory. It returns a list of missing files if any.
- The load_data function reads the necessary CSV files into pandas DataFrames for further processing.
- The preprocess_data function prepares the player statistics data by performing the following steps:
- Dropping Unnecessary Columns: Columns that are not needed for analysis are removed.
- Converting Data Types: The "GP" (games played) column is converted to integers to ensure proper numerical operations.
- Filtering Players: Only players who played more than 40 games in a season are retained.
- Filtering Seasons: The data is filtered to include only the specified seasons.
- Adding All-NBA Nominations: A new column "ALL_NBA_NOMINATION" is added to indicate whether a player received an All-NBA nomination. This column is initially set to 0 for all players.
- The target variable, "ALL_NBA_NOMINATION", is separated from the features.
- Columns that are not needed for the model are removed from the features DataFrame.
- The "DRAFT_YEAR" and "DRAFT_NUMBER" columns are converted to integer type, with undrafted players assigned a value of -1.
- Categorical data in the "TEAM_ABBREVIATION" column is converted to dummy variables.
- The standardize_features function standardizes the features using a StandardScaler. This ensures that all features have a mean of 0 and a standard deviation of 1.
- The train_random_forest function trains a Random Forest model. The dataset is split into training and testing sets, the model is trained on the training set, and predictions are made on the testing set.
- The predict_new_season function uses the trained Random Forest model to predict All-NBA nominations for a new season. It preprocesses the new season's data similarly to the training data, standardizes it, and makes predictions.
- The generate_award_predictions function assigns predicted players to different All-NBA teams based on their predicted probabilities.
- The save_results function saves the predicted results to a JSON file.
- The save_model function saves the trained model and the scaler to a file using pickle.
- The main checks for missing files, loads the data, preprocesses it, trains the model, makes predictions, and saves the results.