In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article shows you how Python handles information in China. The content is concise and easy to understand. It will definitely brighten your eyes. I hope you can gain something through the detailed introduction of this article.
1.1 data crawling
Code:
Import pandas as pddata=pd.read_csv ("example_data.csv", header=1) print (data) data1=pd.read_csv ("Beijing area Information. CSV", header=1,encoding='gbk') data2=pd.read_csv ("Tianjin area Information .csv", encoding='gbk') print (data1) print (data2)
The result of running the code:
First use the read_csv () method of pandas to read the data, and then you can see the corresponding table information.
1.2 check for duplicate data
Dupnum=data.duplicated () print (dupnum)\ # handle duplicate values caldup=data.drop_duplicates () print (caldup)
The result of running the code:
The main thing is to use this duplicated () method to check the duplicates of the data and return a Boolean sequence, which is True only for unique elements. If there is duplicate data, the Flase will be returned in part of that value.
Then we can use drop_duplicates () to delete duplicate values.
1.3 check for missing values
Code:
From pandas import Seriesfrom numpy import NAN\ # import pandas as pd series_obj=Series ([1menone]) pd.notnull (series_obj)\ # what is done above is to test pd.notnull (data) pd.notnull (data1) pd.notnull (data2)
The result of running the code:
Use pd.notnull (data1) to return non-empty values, the return value is a Boolean matrix, and then take df [Boolean matrix] to return rows where id is non-empty.
1.4 check for abnormal values
Import numpy as np\ # 2.4.Checkoutlier def three_sig (ser1): mean_value=ser1.mean ()\ # Standard deviation std_value=ser1.std ()\ # all the outliers outside the range of 3 σ are outliers\ # the values are greater than upright 3 σ and light rain 3 σ rule= (mean_value-3*std_value > ser1) | (ser1.mean () + 3*ser1.std ())
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.