How to use Python to get the information of Dianping's Changsha flavor shrimp store? 07/12 Update SLTechnology News&Howtos

How to use Python to get the information of Dianping's Changsha flavor shrimp store?

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "how to use Python to get the information of Dianping Changsha flavor shrimp store". In the operation of the actual case, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Data reading

First import the required package and read in the obtained dataset.

# Import package import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns import jieba from pyecharts.charts import Bar, Pie, Pagefrom pyecharts import options as opts from pyecharts.globals import SymbolType, WarningTypeWarningType.ShowWarning = Falseimport plotly.express as pximport plotly.graph_objects as go

This dataset contains 745 pieces of data from 50 search pages. The fields include: restaurant name, star rating, star rating, number of comments, per capita consumption, recommended dishes, taste, environment and service scores.

The data preview is as follows:

# read data df = pd.read_excel ('.. / data/ Changsha crayfish data .xlsx') df.drop ('detail_url', axis=1, inplace=True) df.head ()

Data preprocessing

Here we deal with the data as follows for later analysis.

Title: before and after symbol removal

Star: extract stars

Score: extract numeric values and convert to type

Comment_list: extract the scores of taste, environment and service

Delete redundant rows and columns

# Star conversion transform_star = {20: 'two stars', 30: 'Samsung', 35: 'quasi four stars', 40: 'four stars', 45: 'quasi five stars', 50: 'five stars} # process titledf [' title'] = df ['title'] .str.replace (r "\ [\' |\'\]" ") # star handles df ['star'] = df.star.str.extract (r' (\ d +)') [0] .astype ('int') df [' star_label'] = df.star.map (transform_star) # handles scoredf ['score'] = df [' score'] .str.replace (r"\ [\'|\'\] ",") .replace ("[]" Np.nan) df ['score'] = df [' score'] .astype ('float') # taste df [' taste'] = df.comment_list.str.split (','). Str [0] .str.extract (r'(\ d +\. *\ d +). Astype ('float') # Environment df [' environment'] = df.comment_list.str.split (' ') .str [1] .str.extract (r' (\ d +\. *\ d +). Astype ('float') # Service df [' service'] = df.comment_list.str.split (','). Str [1] .str.extract (r'(\ d +\. *\ d +)) .astype ('float') # delete column df.drop (' comment_list', axis=1, inplace=True) # delete row df.dropna (subset= ['taste'], axis=0 Inplace=True) # Delete df = df with fewer records [df.starter records 20]

The processed data are as follows, and there are 560 samples for analysis.

Df.head ()

Data visualization

The following shows some of the visualization code:

1. The quantity distribution of different star stores

The number of quasi-four-star merchants is the largest, accounting for 65%, and those with more than four stars account for 18%, of which the number of five-star merchants is the smallest, with only 10.

# generate data star_num = df.star.value_counts (). Sort_index (ascending=True) x_data = star_num.index.map (transform_star). Tolist () y_data = star_num.values.tolist () # Bar bar1 = Bar (init_opts=opts.InitOpts (width='1350px', height='750px')) bar1.add_xaxis (x_data) bar1.add_yaxis ('' Y_data) bar1.set_global_opts (title_opts=opts.TitleOpts (title=' distribution of merchants of different stars'), visualmap_opts=opts.VisualMapOpts (max_=365) bar1.render ()

2. The distribution of store comments

We assume that the number of comments is the popularity of the store, that is, the hotter it is, the more consumers, the more comments.

As can be seen from the histogram, the data show a relatively serious right-leaning distribution, of which there are only two with more than 10,000 comments. We found that these two are super articles and friends, and super Wen and you are the places where Changsha network celebrities clock in. During the National Day, the super Internet celebrity lobster restaurant, which ranks 16000 + a day, is no wonder it is so hot.

# histogram px.histogram (data_frame=df, x-rated comments wrestling numb, color='star_label', histfunc='sum', title=' comments of different stars', nbins=20, width=1150, height=750)

3. Per capita price range distribution

We draw the histogram of the per capita consumption price distribution of flavor shrimp in all stores, and find that the price distribution is between 20,180 yuan, and most of the per capita consumption is in the range of 67-111 yuan. From an extended point of view, does per capita consumption have anything to do with the stars of merchants?

# histogram px.histogram (data_frame=df, color='star_label', histfunc='sum', title=' crayfish per capita consumer price distribution', nbins=20, width=1150, height=750)

4. the relationship between different star stores and price and other factors.

The relationship between different stars and price:

Violin diagrams of star and price distributions are drawn here to show the distribution and probability density of multiple sets of data. As can be seen from the figure, there are significant differences in the distribution between different stars and prices, showing that the higher the stars, the higher the average consumer price.

# Violin drawings plt.figure (figsize= (15,8)) sns.violinplot (x-rated starter labelings, yearly starting prices, data=df, order= ['five stars', 'quasi-five stars', 'four stars', 'quasi-four stars', 'three stars']); plt.title ('relationship between different stars and prices', fontsize=20) plt.show ()

The relationship between different stars and other scoring items:

We expect that the better the star rating, the higher its score in taste, environment and service, and the hotter it will be. This hypothesis can also be verified by the box chart drawn.

So does the store score have anything to do with taste, environment, service, number of reviews, and average price? Next, let's draw a multivariate diagram and take a look.

5. Numerical variable relationship.

Numerical variable relation

There is a significant linear correlation between store score and taste, environment and service score, which is consistent with the previous verification.

There is no significant relationship between store score and per capita consumer price and the number of comments.

There is a significant positive correlation among taste, environment and service scores, and the three are the same as high.

# multivariate graph sns.pairplot (data=df [['score',' review_num', 'mean_price',' taste', 'environment',' service', 'star_label']], hue='star_label') plt.show ()

Correlation coefficient between numerical variables

In order to verify the above visualization results, we use Python to calculate the pearson correlation coefficient between numerical variables. According to experience, if | r | > = 0.8, it can be regarded as a high correlation. The above conclusion can also be obtained from the thermal map.

# correlation coefficient data_corr = df [['score',' review_num', 'mean_price',' taste', 'environment',' service']] .corr () # Thermal map plt.figure (figsize= (15,10)) sns.heatmap (data_corr, linewidths=0.1, cmap='tab20c_r', annot=True) plt.title ('correlation coefficient between numerical variables', fontdict= {'fontsize':' xx-large' 'fontweight':'heavy'}) plt.xticks (fontsize=12) plt.yticks (fontsize=12) plt.show ()

6. Recommend the cloud picture of vegetable words

Assuming that the recommended dishes in stores are popular dishes in different stores, we use jieba to segment the recommended dishes and draw word cloud maps:

It is found that "stewed shrimp", "flavored shrimp" and "Stir-Fried Shrimps" are the favorite dishes. In addition, when we order shrimp, we also like to order dishes such as "flavored flower armour", "chicken feet", "butter" and so on.

7. K-means clustering analysis cluster proportion

Cluster analysis is used to divide the samples into clusters, the higher the similarity of the members in the same cluster, the better, and the higher the dissimilarity between different clusters, the better. We use Python to do K-means clustering, and classify the numerical variables: score, number of comments, average price, taste, environment, service reviews, where K is 3. Get the above three groups, of which 3 are highly recommended, 459 are generally recommended, and 97 are not recommended. Let's look at the descriptive statistics of these three groups:

The above is the histogram distribution of different clusters, which can be summarized as follows:

Highly recommended: the highest score, the most comments, the highest price

General recommendation: the score is in the middle, the number of comments is in the middle, and the price is in the middle.

Highly recommended: lowest score, lowest number of comments, lowest price

Because an abnormal sample with a comment number of 30509.0 is removed in the cluster analysis. Add this sample and get the final recommendation of the four stores:

This is the end of the content of "how to use Python to get the information of Dianping Changsha flavor shrimp store". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.