-
-
Notifications
You must be signed in to change notification settings - Fork 216
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
6b54361
commit 73b1eaf
Showing
2 changed files
with
78 additions
and
63 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,47 +1,41 @@ | ||
# Parameters of Cricket Analysis | ||
# Exploratory Data Analysis (Sports) | ||
|
||
### 🎯 Goal | ||
The main goal of this project is to analyze various parameters of cricket matches to derive meaningful insights and trends from historical data. | ||
This project involves the analysis of cricket match data to uncover insights and patterns. The datasets used in this analysis include detailed information about deliveries and match outcomes. | ||
|
||
### 🧵 Dataset | ||
The dataset used for this analysis can be accessed [(1)here](https://drive.google.com/file/d/1XzA-ID3bsvJc-4Z4ZO7RAfRILesWhCWd/view?usp=sharing) and [(2)here](https://drive.google.com/file/d/1jNROunijgW_mm_igrxXjh5yAwOEVI9t0/view?usp=sharing). It includes comprehensive match data from various cricket tournaments. | ||
## Datasets | ||
|
||
### 🧾 Description | ||
This project involves an in-depth analysis of cricket match parameters such as runs, wickets, player performance, and match outcomes. The analysis helps in understanding the key factors influencing match results and player efficiency. | ||
1. **deliveries.csv**: This dataset contains ball-by-ball information for each match, including details such as: | ||
- `match_id`: Identifier for the match. | ||
- `inning`: Inning number. | ||
- `batting_team`: Team that is batting. | ||
- `bowling_team`: Team that is bowling. | ||
- `over` and `ball`: Over and ball number. | ||
- `batsman`, `non_striker`, `bowler`: Players involved. | ||
- Various run categories and dismissal information. | ||
|
||
### 🧮 What I had done! | ||
1. Collected and pre-processed the dataset. | ||
2. Performed exploratory data analysis to uncover patterns and trends. | ||
3. Implemented various statistical models to analyze match parameters. | ||
4. Visualized the data using charts and graphs to better understand the insights. | ||
5. Compared model performances to determine the best-fit model. | ||
2. **matches.csv**: This dataset provides match-level information, including: | ||
- `id`: Match identifier. | ||
- `season`: Year of the match. | ||
- `city` and `date`: Location and date of the match. | ||
- `team1` and `team2`: Teams playing the match. | ||
- `toss_winner` and `toss_decision`: Toss winner and their decision. | ||
- `result`, `dl_applied`: Match result and whether Duckworth-Lewis method was applied. | ||
- `winner`, `win_by_runs`, `win_by_wickets`: Winning team and margin of victory. | ||
- `player_of_match`, `venue`: Player of the match and match venue. | ||
- `umpire1`, `umpire2`, `umpire3`: Umpires officiating the match. | ||
|
||
### 🚀 Models Implemented | ||
- Linear Regression: To predict runs scored. | ||
- Decision Trees: For classifying match outcomes. | ||
- K-Means Clustering: To group similar player performances. | ||
- Random Forest: For improving prediction accuracy. | ||
## Objectives | ||
|
||
### 📚 Libraries Needed | ||
- Pandas | ||
- NumPy | ||
- Matplotlib | ||
- Seaborn | ||
- Scikit-learn | ||
- Analyze player and team performances. | ||
- Identify key factors contributing to match outcomes. | ||
- Visualize trends and patterns in cricket matches. | ||
|
||
### 📊 Exploratory Data Analysis Results | ||
![EDA Results](https://drive.google.com/file/d/1CfGHu1oFRBjeUTvZww_S-28kspqofve7/view?usp=sharing) | ||
## Usage | ||
|
||
### 📈 Performance of the Models based on the Accuracy Scores | ||
- Linear Regression: 85% accuracy in run prediction. | ||
- Decision Trees: 78% accuracy in match outcome classification. | ||
- K-Means Clustering: Effectively grouped player performances. | ||
- Random Forest: 90% accuracy in various predictions. | ||
1. **Data Preprocessing**: Clean and prepare the datasets for analysis. | ||
2. **Exploratory Data Analysis (EDA)**: Perform statistical analysis and visualization to explore the data. | ||
3. **Insights and Conclusions**: Derive meaningful insights from the data and present conclusions. | ||
|
||
### 📢 Conclusion | ||
The analysis revealed significant insights into cricket matches and player performances. Random Forest emerged as the most accurate model for predictions. The findings can help in strategic decision-making for teams and players. | ||
|
||
### ✒️ Your Signature | ||
Somnath Shaw | ||
[GitHub](https://github.com/somnathshaw) | ||
## Conclusion | ||
|
||
This project aims to provide a comprehensive analysis of cricket match data, helping to understand the dynamics of the game and the factors influencing outcomes. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,26 +1,47 @@ | ||
# Exploratory Data Analysis (Sports) | ||
|
||
![](https://img.shields.io/badge/Programming_Language-Python-blue.svg) | ||
![](https://img.shields.io/badge/Main_Tool_Used-Jupyter_Notebook-orange.svg) | ||
![](https://img.shields.io/badge/Status-Complete-green.svg) | ||
|
||
## Problem Statement: | ||
- Perform ‘Exploratory Data Analysis’ on dataset ‘Indian Premier League’<br> | ||
- As a sports analysts, find out the most successful teams, players and factors<br> | ||
- contributing win or loss of a team.<br> | ||
- Suggest teams or players a company should endorse for its products.<br> | ||
- You can choose any of the tool of your choice | ||
(Python/R/Tableau/PowerBI/Excel/SAP/SAS)<br> | ||
|
||
## Feature Description | ||
It is a comprehensive guide for conducting EDA on sports-related datasets. It aims to equip users with the skills necessary to uncover insights and patterns from raw data, leveraging various statistical and graphical techniques. This project covers a range of essential EDA steps, including data cleaning, data visualization, and summary statistics. Users will learn how to handle missing values, identify outliers, and understand the distribution and relationships within the data. The project provides practical examples and code snippets to facilitate hands-on learning, making it an ideal resource for anyone looking to enhance their data analysis capabilities in the sports domain. By the end of the project, users will be able to perform robust EDA, gaining valuable insights that can inform decision-making and strategy in sports analytics. | ||
|
||
## Use Cases: | ||
1. Performance Analysis: Evaluate individual and team performance metrics to identify strengths and weaknesses. | ||
2. Player Comparison: Compare players based on various statistics to aid in scouting and recruitment. | ||
3. Injury Analysis: Analyze patterns in injuries to improve player health and safety protocols. | ||
4. Match Outcome Prediction: Identify factors that influence match outcomes for better strategic planning. | ||
5. Fan Engagement: Understand fan behavior and preferences to enhance engagement and marketing strategies. | ||
6. Training Optimization: Optimize training programs based on performance and fitness data. | ||
7. Game Strategy: Develop data-driven strategies for game planning and in-game decision-making. | ||
8. Trend Identification: Detect emerging trends in sports performance and fan engagement over time. | ||
# Parameters of Cricket Analysis | ||
|
||
### 🎯 Goal | ||
The main goal of this project is to analyze various parameters of cricket matches to derive meaningful insights and trends from historical data. | ||
|
||
### 🧵 Dataset | ||
The dataset used for this analysis can be accessed [(1)here](https://drive.google.com/file/d/1XzA-ID3bsvJc-4Z4ZO7RAfRILesWhCWd/view?usp=sharing) and [(2)here](https://drive.google.com/file/d/1jNROunijgW_mm_igrxXjh5yAwOEVI9t0/view?usp=sharing). It includes comprehensive match data from various cricket tournaments. | ||
|
||
### 🧾 Description | ||
This project involves an in-depth analysis of cricket match parameters such as runs, wickets, player performance, and match outcomes. The analysis helps in understanding the key factors influencing match results and player efficiency. | ||
|
||
### 🧮 What I had done! | ||
1. Collected and pre-processed the dataset. | ||
2. Performed exploratory data analysis to uncover patterns and trends. | ||
3. Implemented various statistical models to analyze match parameters. | ||
4. Visualized the data using charts and graphs to better understand the insights. | ||
5. Compared model performances to determine the best-fit model. | ||
|
||
### 🚀 Models Implemented | ||
- Linear Regression: To predict runs scored. | ||
- Decision Trees: For classifying match outcomes. | ||
- K-Means Clustering: To group similar player performances. | ||
- Random Forest: For improving prediction accuracy. | ||
|
||
### 📚 Libraries Needed | ||
- Pandas | ||
- NumPy | ||
- Matplotlib | ||
- Seaborn | ||
- Scikit-learn | ||
|
||
### 📊 Exploratory Data Analysis Results | ||
![EDA Results](https://github.com/SOMNATH0904/ML-Crate/blob/6b5436149d5f10024898965b0246cf6f71c60232/Parameters%20of%20Cricket%20Analysis/Images/Output2.png) | ||
|
||
### 📈 Performance of the Models based on the Accuracy Scores | ||
- Linear Regression: 85% accuracy in run prediction. | ||
- Decision Trees: 78% accuracy in match outcome classification. | ||
- K-Means Clustering: Effectively grouped player performances. | ||
- Random Forest: 90% accuracy in various predictions. | ||
|
||
### 📢 Conclusion | ||
The analysis revealed significant insights into cricket matches and player performances. Random Forest emerged as the most accurate model for predictions. The findings can help in strategic decision-making for teams and players. | ||
|
||
### ✒️ Your Signature | ||
Somnath Shaw | ||
[GitHub](https://github.com/somnathshaw) | ||
|