The first outbreak of the novel coronavirus (Covid-19) was in Wuhan, in mid-December. On Jan 30th, the World Health Organization declared the Covid-19 a global emergency; and on 11 March they declared Covid-19 a pandemic.
In this project, I will analyze the spread of the new corona virus (nCov). I will use two datasets:
- The John Hopkins University's dataset which contains aggregated daily data for confirmed cases, deaths and recovered patients. https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
- The DXY.cn's google sheet which contains information of about 1000 patients. https://docs.google.com/spreadsheets/d/1jS24DjSPVWa4iuxuD4OAXrE3QeI8c9BC1hSlqr-NMiU/edit#gid=1187587451
Project goal: to analyze open data on the spread of the COVID-19 around the world, to collect information about patients, risk groups, the speed of spread of the coronavirus and to determine the death rate by patients age.
UPDATE:
I decided not to make predictions of the future about mortality and the spread of COVID-19, because in order to analyze the development of the process by statistical characteristics, it is necessary to have data with an understandable collection methodology, a method for registering primary data. What do we have in the case of the COVID-19 epidemic?
The most unreliable characteristic, as I see it, is the number of infected people. Because everywhere they use different counting systems. Some purposefully examine all patients with signs of acute respiratory viral infections, others look only at the most severe, others look at the dead, fourth look at risk groups, and fifth look at small groups of random people. And nowhere do they examine all citizens in a row. Plus, in many countries or regions they simply did not test for COVID-19 due to the lack of tests.
Variants that significantly change the picture in local clusters are also possible, because in the group of deaths from coronavirus, the coronavirus deceased are introduced in some places.
In some countries, any deceased enriches the statistics of epidemic victims simply by the fact that he has a coronavirus in his body. Regardless of health effects.
The data stream is of poor quality, therefore, at the moment it cannot be the basis for any reliable modeling; it is impossible to trace how events will develop further.