Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What should be paid attention to in the implementation of Meituan food data crawling by python

2025-01-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Editor to share with you python to achieve Meituan food data crawling need to pay attention to, I believe that most people do not know much, so share this article for your reference, I hope you will learn a lot after reading this article, let's go to know it!

There is a pit in front of the three steps of data crawling

Job requirements need to collect food data from OTA, the type of restaurants in a city, and so on. It's not a big deal for gluttons. However, the end result is that there is no time for lunch and dinner. The situation is as follows

Chrome F12 locates the get request directly, and the result of response is json. After studying the parameters of get, it is found that there is a strange parameter tokenize!

Never mind that he directly modifies the parameters and turns the page to request data!

The data crawls the beginning of the three-step curve.

Here comes the problem! After struggling for a long time, I found that this token is time-limited and generated by js. That's not a problem. Get requests don't work. We still have selenuim. The sad thing is that Meituan really blocked selenuim directly by the big factory.

The data crawl takes three steps to fill up the pit.

Back to square one. We have no choice but to start with token. After searching for it, we found a js file.

Mm-hmm. no, no, no. All right, go on, because there is no direct call to js with python before, Baidu discovers that pyexecjs, PyV8 and so on are OK. Sadly, my python2.7 has not been able to use properly after installing pyexecjs, and there is no problem with PyV8. It's just that the PyV8 installation process is too sad.

Don't talk too much nonsense and go straight to the code:

I store the js file in the local python and directly use PyV8 to parse the js event that executes token.

The program automatically generates token, and can't wait to parse the json data into the library.

After the test is completed, the data of Beijing and Shanghai are grabbed for data visualization.

After the statistician found that Meituan still limited the data to a maximum of 32 per page for each type of restaurant, a total of 32 pages were displayed. That's 32, 32, 10, 24.

Data visualization

The proportion of various types of cuisine in Beijing and Shanghai

It can be seen that Chuanxiang, barbecue and western food account for the largest proportion in the two places. Eat barbecue, hemp small sure enough to distinguish between the north and the south.

In the data, we can analyze the total reviews of each type of food to show the popularity, because we only show top10.

Top10 cuisine in Beijing and Shanghai

Beijing and Shanghai hot pot topped the list.

Let's compare the average prices of similar delicacies in the two places:

These are all the contents of the article "what do you need to pay attention to when python implements Meituan food data crawling?" Thank you for your reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report