In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article will explain in detail how to crawl and comment on Chengdu data in big data development. Xiaobian thinks it is quite practical, so share it with you for reference. I hope you can gain something after reading this article.
1. Crawlers
First of all, the small editor is positioned as Chengdu. The food type is selected as "hot pot". The specific type of hot pot is not limited. The region is not limited. The sorting is intelligent, as shown in the figure:
You can also choose other options, just notice the URL changes. This article is crawling data according to the above options. Next, turn the page and observe the URL change:
Page 2:
Page 3:
It is easy to observe the number behind the knowledge p of the page change, push back to the *** page, and find the same display content, so write a loop, you can crawl all pages. But Dianping only provides the first 50 pages of data, so we can only crawl the first 50 pages.
This time, Xiaobian uses pyquery to analyze the web page, so we need to locate the location of the data we crawled, as shown in the figure:
In the specific analysis of the web page, I was shocked, the public comment anti-crawl too much, its numbers, some text is not plain text display, but code, you still do not know how to analyze it. As shown in the figure:
Very annoying, some text can be displayed, some code. Some numbers are also, but the better thing is that there are only 9 numbers. With a little observation, you can find out what the code of the number is. Here's the little list. {'hs-OEEp': 0, 'hs-4Enz': 2, 'hs-GOYR': 3, 'hs-61V1': 4, 'hs-SzzZ': 5, 'hs-VYVW': 6, 'hs-tQlR': 7, 'hs-LNui': 8, 'hs-42CK': 9}。It is worth noting that the number 1 is expressed in plain text.
So, how to use pyquery to locate it, very simple, you find the data you want to get, and then right click →copy→cut selector, you copy into the code inside OK. The specific usage of pyquery is already available.
***, we obtained the data of 50 hot pot pages, 15 data per page, a total of 750 restaurants.
2. Analysis
Public comment has already given *** evaluation, you can see the general trend.
Quasi-five-star merchants have the most, probably because most customers are used to giving good reviews, only when they are really dissatisfied will they give low reviews, resulting in ratings that are generally not low, but close to full marks or quite few.
In this article, we assume that the number of reviews is the popularity of the restaurant, i.e. the more popular it is, the more reviews it has.
Most of the comments are within 1000, but there are still some more than 2000 and even more than 4000. These restaurants should be some online stores. With 5000 as the constraint, the selected hotels are all well-known hotpot restaurants in Xiaolongkan and Shu Great Xia. Does the number of comments have anything to do with ***? See below:
Here we take the average of the number of reviews and find that for businesses above four stars, the number of reviews and *** are not related, but they are better than the sales of hotels below four stars. This suggests that after four stars and above, people choose restaurants that are not very different, but generally do not want to accept poor reviews.
For a student party like Xiaobian, there is also a greater impact on per capita consumption.
Most of the per capita consumption of hot pot restaurants in Chengdu is in the range of 50-100, and some are higher than 150. For Xiaobian, eating a hot pot, per capita in 50-100 is acceptable, higher than 100, Xiaobian will look down at the wallet. Then expand to see, per capita consumption and ***, the number of comments have a relationship?
The above picture shows the relationship between per capita consumption and ***. It seems that there is no relationship between them. It shows that some hot pot restaurants with good reputation are actually not expensive per capita. Let's look at the relationship between per capita and number of comments.
By comparison, it is found that the number of comments is less than 500, and the average person is the most in the range of 50-100. Of course, this is definitely related to the number of comments and the per capita consumption itself concentrated in this stage.
Eat hot pot, the business of a shop is good or bad, it must also be related to its special dishes, Xiaobian through jieba participle, will climb to the recommended dishes to do a word cloud map, as follows.
The beef of Xiaobian *** is the most special dish, especially spicy beef. As long as you eat hot pot, you have to have one, followed by tripe, shrimp slip, goose sausage and so on.
And then there's the taste, the environment and the service that everyone cares about.
Most of the three scores are concentrated in the 8.0-9.2 stage, Xiaobian believes that hotels below 7.5 points should not try. * **
As expected, the better the rating, the higher the score on taste, environment and service. So do taste, environment, service scores correlate with number of reviews, average price?
As you can see, there is no direct relationship, but we found that there is a very good linear relationship between taste, environment and service, so we took it out and drew a larger graph.
We also fit a linear relationship. Since there is only one three-star merchant, its situation is relatively special. The other *** maintains a fairly consistent fit in the relationship between taste, environment and service, which also proves our conjecture that there is a linear relationship between these variables. Since the purpose of this article *** is to make recommendations, so we carried out K-means clustering, where K is taken as 3, and *** is converted into a number, five stars correspond to 5 points, quasi-five stars correspond to 4.5 points, and so on. Finally got three classes, through the plot, see how clustering situation bar.
Consistent with what we want, the higher the score on taste, environment, service and ***, the more we recommend it. However, there are still a lot of recommended shops, can you concentrate on some of them? Therefore, Xiaobian makes recommendations by limiting the number of comments, per capita consumption and specialties. Since Xiaobian likes shops with few people, cheap and beef, here are the results:
About "how to crawl comments Chengdu data in big data development" this article is shared here, I hope the above content can be of some help to everyone, so that you can learn more knowledge, if you think the article is good, please share it to let more people see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.