Machine-Learning--In-brief

So what is Machine Learning?
In layman language, we feed data to the machine, the machine learns from that data. When a new set of data is provided then based on that learning, the machine makes a decision and prediction.

In supervised learning the data is labelled (i.e every input data is tagged to its corresponding output). The machine is trained with those outputs to make a decision. For instance, at school, the teacher first guided us and taught us how the specific problem is solved and accordingly we work on other problems.

In unsupervised learning the data is not labelled. The machine has to figure out the given data and must find hidden patterns in order to make prediction. A grown-up like you and me. We don't need guidance to help in our daily activity. We figure out things on our own.

Reinforcement Learning Suppose, you were dropped in an isolated island. You will have to learn how to live on the island, adapt to the changing climate, what to eat and what not to eat. So basically, you are following the hit and trial concept because you are new to the surrounding and only way for you to learn is to learn from your experience.
Reinorcement is a learning method where an agent interacts with its environment by doing some actions and discover errors and rewards.

Understanding the Dataset

One of the most important factor before you start working on a problem is that you create a Data Dictionary. Data Dictionary describes what each column or feature of your dataset actually means.

Exploratory Data Analysis

So, once you have got your business problem statement ready, data exploration is the next step which is analysing, summarising, visualising and becoming familiar with the dataset. Because the Data Science project is not just about creating models. Any time you build a machine learnig model, you have to preprocess the data so that model can be trained in the right way. 70% of the total time will be consumed in exploration, cleaning and preparing the data.

Univariate Analysis - Univariate analysis means analysis of a single variable. It mainly describes the characteristics of the variable. -- If the variable is numerical patterns can be found by looking at mean, mode, median, range, variance, maximum, minimum, quartiles, and standard deviation and can be displayed using histograms, frequency distribution tables, boxplots are the best choice for visualizing outliers. -- If the variable is categorical we can use either a bar chart or a pie chart to find the distribution of the classes in the variable.
Bi-variate Analysis - Bivariate analysis involves checking the relationship between two variables simultaneously.

Featuring Engineering

Data Wrangling

Data wrangling or data cleaning is the process of identifying and removing inaccurate records from a dataset.

Handling missing values for continuous features
Handling missing values for categorical features
Handling Categorical Features
Handling outliers
Feature Scaling
Removing duplicates
Checking for class imbalance in categorical variables
Variable transformation
Variable creation

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Decision Tree		Decision Tree
EDA		EDA
Feature Engineering		Feature Engineering
README.md		README.md
ROC_AUC.ipynb		ROC_AUC.ipynb
telecom_churn.csv		telecom_churn.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine-Learning--In-brief

Understanding the Dataset

Exploratory Data Analysis

Featuring Engineering

Data Wrangling

About

Releases

Packages

Languages

rashmiranu/Machine-Learning--In-brief

Folders and files

Latest commit

History

Repository files navigation

Machine-Learning--In-brief

Understanding the Dataset

Exploratory Data Analysis

Featuring Engineering

Data Wrangling

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages