Skip to content

Extracted Population data from United Nation website. Data set contains information about various type of population of various countries of World from 1950 to 2019, with many null values. Using various algorithms (Decision Tree, Random Forest, SVM etc.) to see which one best fits the data most accurately. Explored the data, and provided insight…

Notifications You must be signed in to change notification settings

jatin00000/Forecasting-the-various-Populations-of-Countries

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Forecasting-the-various-Populations-of-Countries

Data Science Project
    Extracted Population data from United Nation website. Data set contains information about various type of population of various countries of World from 1950 to 2019, with many null values. Using various algorithms (Decision Tree, Random Forest, SVM etc.) to see which one best fits the data most accurately. Explored the data, and provided insights and forecasts about population for various countries.


My Team Members~
 - Abhay Panwar
 - Chetan Kumar
 - Gaikwad Chandan Mahadeo
 - Devansh Mishra
 - Mallempadi Yashaswini


It has 6 tasks~
a. Data collection
  b. Data Preprocessing and Cleaning
  c. Data Visualization
  d. Data Statistics(Summary of statistics)
  e. Hypothesis Statement
  f. Prediction Task(Using Machine Learning Model)



Data Collection
The data is extracted from https://population.un.org/wpp/Download/Standard/CSV/
Information about usage of various Classifiers is taken from http://www.Datacamp.org


Data Preprocessing and Cleaning

  • Taking only medium variant of Population in consideration
  • Removing column varID, variant and MidPeriod
  • We used built-in train_test_split in 70/30 split as it provide good accuracy.
  • Source_Code_link

Data Visualization

  • Plotted scatter Plot of all 4 attributes for 10 different countries
  • Also calculated R^2 for each case using Plotly
  • To select one attribute out of four attributes for hypothetical testing
    • Male Population
    • Female Population
    • Total Population
    • Population Density
  • Source_Code_link

Data Statistics

  • Select one attribute from four to make a model for Prediction by making observations from graphs plotted during visualization.
  • Through statistics, find out the names of countries Possible whose total population will be in the range 5000 to 15000 in 2011.
  • Source_File

Hypothesis Statement
From statistics, we have created a hypothetical statement. Try to prove it by creating a model.


Prediction Task(Using Machine Learning Model)

  • Compared accuracy of various classifiers and selected one for making model.
  • Experimentally, Our model proves our Hypothetical statement which is based on using “Total Population” as an attribute to make our model Wrong.
  • Created a model using various algorithm
    • Decision Tree Classification
    • Random Forest Classification
    • Support Vector Machines(SVM)
    • Logistic Regression

My Experience and Learnings
Coded in python language and Google Collab.
Learned usage of various python modules Numpy, Pandas, Plotly, csv and sklearn.


Problems faced

  • Many Classifiers needed both numbers as input and output. So using Attribute “Location” possess a problem.
    Solution:
    Instead attribute “LocID” is used along with other numeric type attributes.
  • Void or NULL values created problem during Pre-processing.
    Solution:
    dropna() method of Pandas Python which removed null values.

About

Extracted Population data from United Nation website. Data set contains information about various type of population of various countries of World from 1950 to 2019, with many null values. Using various algorithms (Decision Tree, Random Forest, SVM etc.) to see which one best fits the data most accurately. Explored the data, and provided insight…

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published