This repository contains portfolio of data science projects for the purpose of self learning and hobby.
Presented in the form of iPython Notebooks, and R files. IPython Notebooks are made using google colab.
For better visualization of portfolio please visit the website abshkpskr.github.io
Most of the visulizations are in plotly which are not visible on github so please visit the website.
-
-
COVID-19 pandemic Global Analysis: This project contains a detailed study of spreading of coronavirus disease. Plotly and matplotlib, python visualization libraries are used to indicate the spread of virus in different countries. The spread form country to counry is indicated using plotly choropleth (world map) timeline animation. By this analysis we can see which countries are able to flatten the curve and are able to successfully contain the virus. Different metrics like mortality rate and spread rate are calculated and plotted using effective visualization techniques.
-
COVID-19 pandemic INDIA Analysis: A separate report on the ongoing spread of virus in India. It covers the spread of virus in different states of India and what are the factors that effect the spread. A deep analysis is done using vizualizations which includes datetime graph and district level map plotting. Also an overview of decisions taken by our government and the outcome of those decisions.
Tools: Pandas, matplotlib, plotly, scipy
-
-
-
Python
-
Predicting Number of Rental Bikes: A model to predict the number of bikes going on rent using previous two years data. Data includes features like season, weather condition, humidity, windspeed etc, an analysis on these factors is done to indicate the relation with the outcome. This helps company to maintain the inventory, required in different situations and assist to maximize the profit. It is a supervised machine learning regression problem as we have to predict the number of bikes which is a continuous feature.
-
Predicting Employee Absenteeism: This project targets the study of underlying factors that causes employees absenteeism. Factors like reason for absence, chronic diseases, money spend in commute, distance between office and home, number of children etc are used to predict the absenteeism behaviour of an employee. Although the target feature in this problem is a continous variable but binning technique is used to convert it to a supervised machine learning classification problem. This study helps the organization to better manage the man power and aids in improving the hiring process.
Tools: Pandas, matplotlib, plotly, scipy
-
-
R language
-
Predicting Number of Rental Bikes: This is the same project as above, implemented here using R language. Both these projects have step wise analysis which includes data preparation, exploratory data analysis, outlier analysis. For prediction three algorithms are compared, Linear Regession, Decicion Tree and Random Forest.
-
Predicting Employee Absenteeism: The employee absenteeism project implemented in R language. Here data analysis steps include data preparation, visualization using ggplot, missing value analysis, outlier analysis, feature selection and scaling. Here monthly loss for company is also calulated.
Tools: caret, ggplot2, Hmisc, PerformanceAnalytics, caTools, randomForest, e1071
-
-
-
- CoronaVirus Timeline: This is timeline report of coronavirus worldwide spread. This report contains the series of events occured like the official news from China, reaction of different countries, decisions taken by authorities of different countries, public reaction, lockdown dates etc. Links, press releases, official documnets, news videos these all are embeded in this report to tell a story that how the world tackled the coronavirus pandemic.
If you liked what you saw, want to have a chat with me about the portfolio, work opportunities, or collaboration, shoot an email at 282abhishek@gmail.com