The objective of this project was to reproduce the findind of the paper "COVID-19 Cases and Deaths in Southeast Asia Clustering using K-Means Algorithm" of "Juniar Hutagalung 2021", and the expand its scoupe, to more regions and a larger time framework. We decided to take the WHO database for all countries and their respective sanitary data with respect to covid-19 We implemented a K-means Clustering Data Mining method and obtained data into clusters. This process was implemented using Python, with the data utilized being country statistics, area of recorded laboratory-confirmed cases of COVID-19, and April 2020 deaths from the World Health Organization (WHO). The resulting clusters were classified as high (1), medium (2), and low (0).
We further expanded the paper by adding 4 questions:
Can the SEARO region be divided into clusters using the K-means algorithm? Is K-means an appropriate algorithm for this task? Are the chosen variables adequate for obtaining meaningful results? Which of the two clustering methods is more efficient? How our results will be affected if we took under consideration the whole world for the years 2020,2021,2022?