Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does python crawl the data of Meituan's 1024 barbecue restaurants

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how python crawls the data of Meituan's 1024 barbecue restaurants. It is very detailed and has a certain reference value. Interested friends must read it!

Analyze the real URL https://apimobile.meituan.com/group/v4/poi/pcsearch/30?uuid= your & userid=-1&limit=32&offset=32&cateId=-1&q=%E7%83%A4%E8%82%89

Main parameters:

30: city id (30 represents Shenzhen)

Limit: number of stores per page

Offset: page turning parameter (each additional 32 page turns)

Q: keyword (barbecue in this case)

Only 1024 store data can be obtained by crawling according to the above API. In order to obtain more comprehensive data, you also need to find the areaId parameter (sub-region), and then traverse the sub-region to get the complete data. Limited to space, only the core code is given.

Def get_meituan (): try: for areaId in areaId_list: for x in range (0, 2000, 32): time.sleep (random.uniform (2)) # set sleep time print ('extracting'% areaId with% d areadId 'page d'% int ((xcrawl 32) / 32) # print crawl progress url = 'https://apimobile.meituan.com/group/v4/poi/pcsearch/30?uuid= your & userid=-1&limit=32&offset= {0} & cateId=-1&q=%E7%83%A4%E8%82%89&areaId= {1}' .format (x AreaId) print (url) headers = {'Accept':' * / *', 'Accept-Encoding':' gzip, deflate, br', 'Accept-Language':' zh-CN,zh Qcow 0.9, 'Connection':' keep-alive', 'Cookie':' your', 'User-Agent':' Mozilla/5.0 (Macintosh Intel Mac OS X 10 / 14 / 6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36', 'Host':' apimobile.meituan.com', 'Origin':' https://sr.meituan.com', 'Referer':' https://sr.meituan.com/s/%E7%83%A4%E8%82%89/'} response = requests.get (url, headers=headers) print (response.status_code) data processing

More than 20,000 barbecue restaurant information was crawled down in just a few minutes. In order to facilitate visual analysis, it is also necessary to simply clean the crawled data.

Import data

Import data and add column names, and use sample () method to randomly select 5 sample data previews.

Import pandas as pdimport numpy as npdf = pd.read_csv ('/ Users/wangjia/Documents/ technology account / project / 2.spider/ Meituan / Shenzhen barbecue 1.csventing, names = ['shop name', 'shop address', 'per capita consumption', 'store rating', 'number of comments', 'business district', 'picture link', 'shop type' 'contact information']) df.sample (5)

Delete duplicate data df = df.drop_duplicates () missing value processing

As you can see from the above, only the contact information field contains the missing value and is filled with text.

Df = df.fillna ('no data yet') store address cleaning

Intercept the district and county through the store address field. In addition, "South Australia University" belongs to Longgang District and is directly replaced by the replace () method.

Df ['district and county'] = df ['store address'] .str [: 3] .str.replace ('South Australia University', 'Longgang District') store rating cleaning

According to Meituan's scoring method, the store scoring field is divided and the scoring type column is obtained.

Cut = lambda x: 'normal' if x

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report