Forecasting-the-various-Populations-of-Countries

Data Science Project
Extracted Population data from United Nation website. Data set contains information about various type of population of various countries of World from 1950 to 2019, with many null values. Using various algorithms (Decision Tree, Random Forest, SVM etc.) to see which one best fits the data most accurately. Explored the data, and provided insights and forecasts about population for various countries.

My Team Members~
- Abhay Panwar
- Chetan Kumar
- Gaikwad Chandan Mahadeo
- Devansh Mishra
- Mallempadi Yashaswini

It has 6 tasks~
a. Data collection
b. Data Preprocessing and Cleaning
c. Data Visualization
d. Data Statistics(Summary of statistics)
e. Hypothesis Statement
f. Prediction Task(Using Machine Learning Model)

Data Collection
The data is extracted from https://population.un.org/wpp/Download/Standard/CSV/
Information about usage of various Classifiers is taken from http://www.Datacamp.org

Data Preprocessing and Cleaning

Taking only medium variant of Population in consideration
Removing column varID, variant and MidPeriod
We used built-in train_test_split in 70/30 split as it provide good accuracy.
Source_Code_link

Data Visualization

Plotted scatter Plot of all 4 attributes for 10 different countries
Also calculated R^2 for each case using Plotly
To select one attribute out of four attributes for hypothetical testing
- Male Population
- Female Population
- Total Population
- Population Density
Source_Code_link

Data Statistics

Select one attribute from four to make a model for Prediction by making observations from graphs plotted during visualization.
Through statistics, find out the names of countries Possible whose total population will be in the range 5000 to 15000 in 2011.
Source_File

Hypothesis Statement
From statistics, we have created a hypothetical statement. Try to prove it by creating a model.

Prediction Task(Using Machine Learning Model)

Compared accuracy of various classifiers and selected one for making model.
Experimentally, Our model proves our Hypothetical statement which is based on using “Total Population” as an attribute to make our model Wrong.
Created a model using various algorithm
- Decision Tree Classification
- Random Forest Classification
- Support Vector Machines(SVM)
- Logistic Regression

My Experience and Learnings
Coded in python language and Google Collab.
Learned usage of various python modules Numpy, Pandas, Plotly, csv and sklearn.

Problems faced

Many Classifiers needed both numbers as input and output. So using Attribute “Location” possess a problem.
Solution:
Instead attribute “LocID” is used along with other numeric type attributes.
Void or NULL values created problem during Pre-processing.
Solution:
dropna() method of Pandas Python which removed null values.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
Source Data		Source Data
Source_code		Source_code
Project_Slides.pdf		Project_Slides.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Forecasting-the-various-Populations-of-Countries

About

Releases

Languages

jatin00000/Forecasting-the-various-Populations-of-Countries

Folders and files

Latest commit

History

Repository files navigation

Forecasting-the-various-Populations-of-Countries

About

Topics

Resources

Stars

Watchers

Forks

Releases

Languages