In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "how Python collects Taobao cherry data". In daily operation, I believe many people have doubts about how Python collects Taobao cherry data. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubt of "how to collect Taobao cherry data by Python". Next, please follow the editor to study!
Data acquisition
This paper uses Python to collect the cherry sales data of 1585 merchants on Taobao, and obtains the cherry commodity name, commodity price, number of payers, shop name, shipping address and other fields. Limited to space, the crawler code gives only the main function:
Def main (): browser.get ('https://www.taobao.com/') page = search_product (key_word) print (page) get_data () page_num = 70 while int (page)! = page_num: print ("-" * 100) print ("crawling page {} data" .format (page_num + 1)) browser.get (' https) : / / s.taobao.com search engine Q = {} & s = {} '.format (key_word Page_num*44) browser.implicitly_wait (10) get_data () page_num + = 1 print ("data crawling completed") if _ name__ = ='_ main__': key_word = "Cherry" browser = webdriver.Chrome (". / chromedriver") main () data processing
Data read and preview
Df.info () Int64Index: 1595 entries, 0 to 1674Data columns (total 5 columns): # Column Non-Null Count Dtype--- 0 Trade name 1595 non-null object 1 Commodity Price 1595 non-null float64 2 payers 1595 non-null object 3 Store name 1595 non-null object 4 Shipping address 1585 non-null objectdtypes: float64 (1) Object (4) memory usage: 74.8 + KB
View data information
Df.info () Int64Index: 1595 entries, 0 to 1674Data columns (total 5 columns): # Column Non-Null Count Dtype--- 0 Trade name 1595 non-null object 1 Commodity Price 1595 non-null float64 2 payers 1595 non-null object 3 Store name 1595 non-null object 4 Shipping address 1585 non-null objectdtypes: float64 (1) Object (4) memory usage: 74.8 + KB
The following problems are found in the data:
(1) the shipping address has a missing value.
(2) the number of payers needs to be withdrawn.
(3) the shipping address needs to be segmented
(4) customize the index and descend
Data cleaning
# remove missing records df.dropna (axis=0, how='any', inplace=True) # split provinces and cities from the shipping address field df ["province"] = df ["shipping address"] .str.split (', expand=True) [0] # expand=True can directly separate the segmented content into df ["city"] = df ["shipping address"] .str.split ('' Expand=True) [1] # extract city df ["city"] .fillna (df ["province"], inplace=True) # null values in the city field are filled with province non-null values # extract the number import redf ['number'] = [re.findall (\ d +\. {0jue 1}\ d *) 'from the number of payers with regular expressions I) [0] for i in df ['number of payers'] # extract value df ['number'] = df ['number'] .astype ('float') # convert numeric df [' unit'] = ['.join (re.findall (r' (ten thousand)') I) for i in df ['payers']] # extraction unit (ten thousand) df ['Units'] = df ['Units'] .apply (lambda XRO 10000 if xpayments' million 'else 1) df [' payers'] = df ['numbers'] * df ['Units'] # calculate the number of payments df.drop (['Shipping address', 'number', 'Unit'], axis=1 Inplace=True) # remove redundant columns # descending by commodity price and reset index df = df.sort_values (by= "commodity prices", axis=0, ascending=False) # descending df = df.reset_index (drop=True) # reset index
After cleaning, the data preview is as follows:
At this point, the study on "how to collect Taobao cherry data by Python" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.