In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces how python crawls the data of Meituan's 1024 barbecue restaurants. It is very detailed and has a certain reference value. Interested friends must read it!
Analyze the real URL https://apimobile.meituan.com/group/v4/poi/pcsearch/30?uuid= your & userid=-1&limit=32&offset=32&cateId=-1&q=%E7%83%A4%E8%82%89
Main parameters:
30: city id (30 represents Shenzhen)
Limit: number of stores per page
Offset: page turning parameter (each additional 32 page turns)
Q: keyword (barbecue in this case)
Only 1024 store data can be obtained by crawling according to the above API. In order to obtain more comprehensive data, you also need to find the areaId parameter (sub-region), and then traverse the sub-region to get the complete data. Limited to space, only the core code is given.
Def get_meituan (): try: for areaId in areaId_list: for x in range (0, 2000, 32): time.sleep (random.uniform (2)) # set sleep time print ('extracting'% areaId with% d areadId 'page d'% int ((xcrawl 32) / 32) # print crawl progress url = 'https://apimobile.meituan.com/group/v4/poi/pcsearch/30?uuid= your & userid=-1&limit=32&offset= {0} & cateId=-1&q=%E7%83%A4%E8%82%89&areaId= {1}' .format (x AreaId) print (url) headers = {'Accept':' * / *', 'Accept-Encoding':' gzip, deflate, br', 'Accept-Language':' zh-CN,zh Qcow 0.9, 'Connection':' keep-alive', 'Cookie':' your', 'User-Agent':' Mozilla/5.0 (Macintosh Intel Mac OS X 10 / 14 / 6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36', 'Host':' apimobile.meituan.com', 'Origin':' https://sr.meituan.com', 'Referer':' https://sr.meituan.com/s/%E7%83%A4%E8%82%89/'} response = requests.get (url, headers=headers) print (response.status_code) data processing
More than 20,000 barbecue restaurant information was crawled down in just a few minutes. In order to facilitate visual analysis, it is also necessary to simply clean the crawled data.
Import data
Import data and add column names, and use sample () method to randomly select 5 sample data previews.
Import pandas as pdimport numpy as npdf = pd.read_csv ('/ Users/wangjia/Documents/ technology account / project / 2.spider/ Meituan / Shenzhen barbecue 1.csventing, names = ['shop name', 'shop address', 'per capita consumption', 'store rating', 'number of comments', 'business district', 'picture link', 'shop type' 'contact information']) df.sample (5)
Delete duplicate data df = df.drop_duplicates () missing value processing
As you can see from the above, only the contact information field contains the missing value and is filled with text.
Df = df.fillna ('no data yet') store address cleaning
Intercept the district and county through the store address field. In addition, "South Australia University" belongs to Longgang District and is directly replaced by the replace () method.
Df ['district and county'] = df ['store address'] .str [: 3] .str.replace ('South Australia University', 'Longgang District') store rating cleaning
According to Meituan's scoring method, the store scoring field is divided and the scoring type column is obtained.
Cut = lambda x: 'normal' if x
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.