Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to deal with Chinese Regional Information by Python

2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article shows you how Python handles information in China. The content is concise and easy to understand. It will definitely brighten your eyes. I hope you can gain something through the detailed introduction of this article.

1.1 data crawling

Code:

Import pandas as pddata=pd.read_csv ("example_data.csv", header=1) print (data) data1=pd.read_csv ("Beijing area Information. CSV", header=1,encoding='gbk') data2=pd.read_csv ("Tianjin area Information .csv", encoding='gbk') print (data1) print (data2)

The result of running the code:

First use the read_csv () method of pandas to read the data, and then you can see the corresponding table information.

1.2 check for duplicate data

Dupnum=data.duplicated () print (dupnum)\ # handle duplicate values caldup=data.drop_duplicates () print (caldup)

The result of running the code:

The main thing is to use this duplicated () method to check the duplicates of the data and return a Boolean sequence, which is True only for unique elements. If there is duplicate data, the Flase will be returned in part of that value.

Then we can use drop_duplicates () to delete duplicate values.

1.3 check for missing values

Code:

From pandas import Seriesfrom numpy import NAN\ # import pandas as pd series_obj=Series ([1menone]) pd.notnull (series_obj)\ # what is done above is to test pd.notnull (data) pd.notnull (data1) pd.notnull (data2)

The result of running the code:

Use pd.notnull (data1) to return non-empty values, the return value is a Boolean matrix, and then take df [Boolean matrix] to return rows where id is non-empty.

1.4 check for abnormal values

Import numpy as np\ # 2.4.Checkoutlier def three_sig (ser1): mean_value=ser1.mean ()\ # Standard deviation std_value=ser1.std ()\ # all the outliers outside the range of 3 σ are outliers\ # the values are greater than upright 3 σ and light rain 3 σ rule= (mean_value-3*std_value > ser1) | (ser1.mean () + 3*ser1.std ())

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report