In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "how to use Python e-commerce cherry sales data", the content of the article is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "how to use Python e-commerce cherry sales data"!
01 data acquisition
This paper uses Python to collect the cherry sales data of 1585 merchants on Taobao, and obtains the cherry commodity name, commodity price, number of payers, shop name, shipping address and other fields. Limited to space, the crawler code gives only the main function:
Def main:browser.get ('https://www.taobao.com/')page = search_product (key_word) print (page) get_datapage_num = 70while int (page)! = page_num:print ("-" * 100) print ("crawling page {} data" .format (page_num + 1)) browser.get (' https://s.taobao.com/search?q={}&s={}'.format(key_word,) Page_num*44) browser.implicitly_wait (10) get_datapage_num + = 1print ("data crawling complete") if _ _ name__ = ='_ _ main__':key_word = "Cherry" browser = webdriver.Chrome (". / chromedriver") main
02 data processing
1. Data read and preview
Import pandas as pdimport numpy as npdf = pd.read_csv ('/ dish J learn Python/ Taobao / cherry .csv', header=None,names= ['commodity name', 'commodity price', 'payer', 'shop name', 'shipping address']) # add field name df.sample (5)
two。 View data information
Df.infoInt64Index: 1595 entries, 0 to 1674Data columns (total 5 columns): # Column Non- Count Dtype----0 Commodity name 1595 non- object1 Commodity Price 1595 non- float642 payers 1595 non- object3 Shop name 1595 non- object4 Shipping address 1585 non- objectdtypes: float64 (1), object (4) memory usage: 74.8 + KB
The following problems are found in the data:
(1) the shipping address has a missing value.
(2) the number of payers needs to be withdrawn.
(3) the shipping address needs to be segmented
(4) customize the index and descend
3. Data cleaning
# remove missing records df.dropna (axis=0, how='any', inplace=True) # split provinces and cities from the shipping address field df ["province"] = df ["shipping address"] .str.split (', expand=True) [0] # expand=True can directly separate the segmented content into df ["city"] = df ["shipping address"] .str.split ('' Expand=True) [1] # extract city df ["city"] .fillna (df ["province"], inplace=True) # null values in the city field are filled with province non-null values # extract the number import redf ['number'] = [re.findall (\ d +\. {0jue 1}\ d *) 'from the number of payers with regular expressions I) [0] for i in df ['number of payers'] # extract value df ['number'] = df ['number'] .astype ('float') # convert numeric df [' unit'] = ['.join (re.findall (r' (ten thousand)') I) for i in df ['payers']] # extraction unit (ten thousand) df ['Units'] = df ['Units'] .apply (lambda XRO 10000 if xpayments' million 'else 1) df [' payers'] = df ['numbers'] * df ['Units'] # calculate the number of payments df.drop (['Shipping address', 'number', 'Unit'], axis=1 Inplace=True) # remove redundant columns # descending by commodity price and reset index df = df.sort_values (by= "commodity prices", axis=0, ascending=False) # descending df = df.reset_index (drop=True) # reset index
After cleaning, the data preview is as follows:
Thank you for your reading, the above is the content of "how to use Python e-commerce cherry sales data". After the study of this article, I believe you have a deeper understanding of how to use Python e-commerce cherry sales data. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.