Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of crawling Meituan website Information by python

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces python crawling Meituan website information example analysis, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

DEFAULT_REQUEST_HEADERS = {

'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'

'Accept-Language': 'zh-CN,zh;q=0.9'

'Cache-Control': 'max-age=0'

'Proxy-Connection': 'keep-alive'

'Host': 'chs.meituan.com'

'Referer': 'http://chs.meituan.com/',

'Upgrade-Insecure-Requests':'1'

'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'

'Content-Type': 'text/html;charset=utf-8'

'Cookie':' _ lxsdk_cuid=164c9bed44ac8-0bf488e0cbc5d9-5b193413-1fa400-164c9bed44bc8; _ _ mta=248363576.1532393090021.1532393090021.1532393090021.1; rvct=70%2C1; ci=70; iuuid=30CB504DBAC7CCDD72645E3809496C48229D8143D427C01A5532A4DDB0D42388; cityname=%E9%95%BF%E6%B2%99; _ lxsdk=30CB504DBAC7CCDD72645E3809496C48229D8143D427C01A5532A4DDB0D42388; _ ga=GA1.2.1889738019.1532505689; uuid=2b2adb1787947dbe0888.1534733150.0.0.0; oc=d4TCN9aIiRPd6Py96Y94AGxfsjATZHPGsCDua9-Z_NQHsXDcp6WlG2x7iJpYzpSLttNvEucwm_D_SuJ7VRJkLcjqV6Nk8s_q3VyOJw5IsVJ6RJPL3qCgybGW3vxTkMHr9A4yChReTafbZ7f93F1PkCyUeFBQV4D-YXoVoFV5h4o; _ lx_utm=utm_source%3DBaidu%26utm_medium%3Dorganic; client-id=97664882-24cdmuri b21cMust25de878708e; lat=28.189822; lng=112.97422 _ lxsdk_s=165553df04a-bc8-311Merba 7% 7C% 7C6'

}

It's a little ugly to insert the code directly, but just to make do, this is the headers you can access. Put it directly into the framework, but it will still happen when you redirect to the 403 page and run to the CAPTCHA page, so you still need to deal with it. You can match whether the url of the response is consistent. If it is inconsistent, what to do? only provide ideas. I am still improving the code.

This is the log printed after getting the data.

The details of the specific comments will be retrieved later, and now the data is a bit messy in a collection.

Detailed code:

The code is not perfect, and will encounter being redirected to the CAPTCHA page, need to deal with CAPTCHA, when the number of requests is too many need to use proxy ip, these are need to solve, now posted the code there are many problems, there are powerful can help!

Thank you for reading this article carefully. I hope the article "sample Analysis of python crawling Meituan website Information" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report