Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python to analyze the Development trend of novel coronavirus

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

How to use Python to analyze the development trend of novel coronavirus, I believe that many inexperienced people do not know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

Everyone knows about this epidemic, and all places have delayed the start of school or work, which is just an opportunity for us to learn in depth. Today, I will lead you to analyze the outbreak trend of novel coronavirus, and take it as a practical case of a data analysis course, from data acquisition, data cleaning, data visualization to the output of data conclusions, a complete data analysis process.

The data used this time are data collected by Johns Hopkins University on virus outbreaks worldwide.

Import required package and data cleaning

First: delete unwanted data columns

We can see from the data that the first column is equivalent to the number, and the fifth column is the last time the data is updated. These two columns are of no practical significance to our analysis, so delete these two columns first:

Second: deal with the null values in the dataset

Let's first take a look at the overall situation of the data:

We found that only the province field has null values, so let's take a look at the specific null values:

After screening, it is found that the vacancy is in some foreign provinces, which is due to the data collection process, and we have no way to infer what it is, so we choose not to deal with the null value here.

Third: delete duplicate data

By using the dumplicate method, we find that there is no duplication in this manually collated dataset, so there is no need for deduplication.

Data insight

First of all, let's take a look at how many countries in the world have "fallen" by the time the data are completed:

According to statistics, only 32 countries have confirmed patients. However, careful students may find that there are "China" and "Mainland China" in the country list, and the second one means "Chinese mainland". In fact, it is also China, so we should change "Mainland China" to "China". In the actual work process, this often happens in cross-departmental data, so Dealing with this data noise is also one of the daily tasks of data analysts.

Next, let's take a look at the time field, which is also an indispensable step in the process of data analysis:

The time here is accurate to "hours". In order to facilitate statistics, we change it to "days":

Next, we use the country as a dimension to count the number of diagnosed people in each country:

The one who ranks first is definitely China, and the ones at the top are basically the Asian countries adjacent to China, and among the European and American countries, the one who ranks first is Germany. If it is in the course of real work, Germany is the "outlier." it must be dug in depth. Here we are just doing an example.

Then we take time as a dimension to analyze the changes in the number of infected people every day:

From here, we can see that the number of infections has increased from 555 to 24503 within 14 days, and the growth rate is still very fast. Then we need to analyze in detail the number of newly diagnosed people every day. Here we need to use the diff () method:

Data visualization

First of all, let's take a look at the number of diagnosed people every day, which is basically a trend of exponential growth, which is in line with the law of the outbreak of infectious diseases. what we need to do is to gain insight into the arrival of the inflection point based on the subsequent data.

Next, let's take a look at the daily trend of "deaths" and "cures". From this data, the growth trend of the number of cured people has exceeded the number of deaths, so, in terms of both the "best" and the "worst", the overall trend is still good, and we don't have to worry too much.

After reading the above, have you mastered how to use Python to analyze novel coronavirus's development trend? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report