In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the relevant knowledge of "how to use Pyecharts to generate cloud words". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Preface
First of all, we have to understand two concepts-the upper chest circumference and the lower chest circumference. Look at the schematic diagram.
Through the difference between the upper chest circumference and the lower chest circumference, we can determine the size of the cup. The specific corresponding relationship can be found in the following figure:
With the lower bust & cup size, you can determine the corresponding size of the bra.
Of course, this is divided into British size and international size, refer to the following figure:
From pyecharts.charts import * from pyecharts import options as optsfrom pyecharts.commons.utils import JsCodefrom collections import Counterimport reimport pandas as pdimport jiebaimport jieba.posseg as psgfrom stylecloud import gen_stylecloudfrom IPython.display import Image data processing
The original data is in txt format. In order to facilitate processing, it is converted to Dataframe~.
The size part uses the regular expression to extract the corresponding lower bust and cup size. The specific code is as follows:
In [2]:
Patterns = re.compile (r'(? P.*), color category: (? P.C.?) Size: (? P.C.), (? P.C.)') with open ('/ home/kesci/input/cup6439/cup_all.txt', 'r') as f: data = f.readlines () obj_list = [] for item in data: obj = patterns.search (item) obj_list.append (obj.groupdict ()) data = pd.DataFrame (obj_list) data = pd.concat ([data) Data ['size'] .str.extract (? P [7-9] {1} [0 | 5] {1}). * (? P [a-zA-Z])', expand=True)], axis=1) data.head ()
Out [2]:
Colorcommentdatetimesizecircumferencecup0 skin thin model is good to buy for my mother-in-law, ready to buy two more 2017-04-20 13:06:0438/85C85C1H007 sapphire blue plus pink as good as imagined! The price is affordable! The effect of chest closing is very good and comfortable to wear, but what I want is sapphire blue and skin color! Sent a pink, not... 2017-04-23 21:44:2034/75B75B2 ultra-thin cup pure white is really good 2017-05-18 10:36:3180C80C3 light purple bought two pieces at a time, the underwear quality is good, the non-steel ring design is very comfortable and stylish, it is worth buying. 2017-04-19 20 09:16:4775A75A 44V 5136B = 80B80B4 khaki color was returned directly because the mobile phone number was filled in incorrectly. _ (: _ "khaki) _ but the seller was kind enough to send me back the qwq2017-05-07 09:16:4775A75A category.
Let's look at the most common keywords in commodity classification through jieba participle.
Color: skin color > black > pink > white
Thin payment > thick payment
The steel ring seems to be an important selling point.
In [3]:
W_all = [] for item in data.color: Warrel = psg.cut (item) Warrel = [w for w, f in Whitel if f in ('nasty,' nr') and len (w) > 1] w_all.extend (Warrel) c = Counter (w_all) Building prefix dict from the default dictionary... Dumping model to file cache / tmp/jieba.cacheLoading model cost 0.769 seconds.Prefix dict has been built succesfully.
In [4]:
Counter = c.most_common (50) bar = (Bar (init_opts=opts.InitOpts (theme='purple-passion', width='1000px', height='800px') .add _ xaxis ([x for x, y in counter [::-1]]) .add _ yaxis ('number of occurrences', [y for x, y in counter [::-1]] Category_gap='30%') .set _ global_opts (title_opts=opts.TitleOpts (title= "most frequently used keywords", pos_left= "center", title_textstyle_opts=opts.TextStyleOpts (font_size=20)), datazoom_opts=opts.DataZoomOpts (range_start=70) Range_end=100, orient='vertical'), visualmap_opts=opts.VisualMapOpts (is_show=False, max_=6e4, min_=3000, dimension=0, range_color= ['# f5d69f,'# f5898baked,'# ef5055']), legend_opts=opts.LegendOpts (is_show=False), xaxis_opts=opts.AxisOpts (is_show=False,) Yaxis_opts=opts.AxisOpts (axistick_opts=opts.AxisTickOpts (is_show=False), axisline_opts=opts.AxisLineOpts (is_show=False)) .set _ series_opts (label_opts=opts.LabelOpts (is_show=True, position='right') Font_style='italic'), itemstyle_opts= {"normal": {"barBorderRadius": [30, 30, 30, 30], 'shadowBlur': 10 'shadowColor': 'rgba (120,36,50,0.5)', 'shadowOffsetY': 5 }}) .reversal_axis () bar.render_notebook ()
Out [4]:
In [5]:
T_data = data.groupby (['circumference',' cup']) ['datetime'] .count (). Reset_index () t_data.columns = [' circumference', 'cup',' num'] # t_data.num = round (t_data.num.div (t_data.num.sum (axis=0), axis=0) * 100,1) data_pair = [{"name": 'show "," label ": {" show ": True} "children": []}, {"name": 'show, "" label ": {" show ": True}," children ": []}, {" name ":' True": {"show": True}, 'shadowBlur': 10,' shadowColor': 'rgba 36,50,0.5)', 'shadowOffsetY': 5, "children": []}, {"name":' dating, "label": {"show": False}, "children": []}, {"name": 'eyed, "label": {"show": False} "children": []}] for idx, row in t_data.iterrows (): t_dict = {"name": row.cup, "label": {"show": True}, "children": []} if row.num > 3000: child_data = {"name":'{}-{} '.format (row.circumference, row.cup), "value": row.num "label": {"show": True}} else: child_data = {"name":'{}-{} '.format (row.circumference, row.cup), "value": row.num "label": {"show": False} if row.cup = = "A": data_pair [0] ['children'] .append (child_data) elif row.cup = = "B": data_pair [1] [' children'] .append (child_data) elif row.cup = = "C": data_pair [2] ['children'] .append (child_data) elif row. Cup = "D": data_pair [3] ['children'] .append (child_data) elif row.cup = = "E": data_pair [4] [' children'] .append (child_data) size distribution
If you look at the cup size alone: B > A > C
Subdivided into specific sizes: 75B > 80B > 75A > 70A
In [6]:
C = (Sunburst (init_opts=opts.InitOpts (theme='purple-passion', width= "1000px", height= "1000px")) .add ("", data_pair=data_pair, highlight_policy= "ancestor", radius= [0,100%], sort_='null', levels= [{}) {"R0": "20%", "r": "48%", "itemStyle": {"borderColor": 'rgb (220220220)', "borderWidth": 2}}, {"R0": "50%", "r": "80%", "label": {"align": "right"} "itemStyle": {"borderColor": 'rgb (220220220)', "borderWidth": 1}}],) .set _ global_opts (visualmap_opts=opts.VisualMapOpts (is_show=False, max_=90000, min_=3000, range_color= ['# f5d69f,'# f5898baked,'# ef5055'])) Title_opts=opts.TitleOpts (title= "bra\ n\ nsize distribution", pos_left= "center", pos_top= "center", title_textstyle_opts=opts.TextStyleOpts (font_style='oblique', font_size=30) ) .set _ series_opts (label_opts=opts.LabelOpts (font_size=18, formatter= "{b}: {c}")) c.render_notebook ()
Out [6]:
Cup size distribution
Let's look at the ratio of cups through different bust measurements:
Lower chest circumference = 70V A > B > C
Lower chest circumference = 75RB > A > C
Lower chest circumference = 80RB > A > C
Lower chest circumference = 85RB > C > A
Lower chest circumference = 90VR C > B > A
Lower chest circumference = 95VR C > B > D
In [7]:
Grid = Grid (init_opts=opts.InitOpts (theme='purple-passion', width='1000px', height='1000px')) for idx, c in enumerate (['70,'75,'80,'85,'90' ): if idx% 2 = 0: X = 30 y = int (idx/2) * 30 + 20 else: X = 70 y = int (idx/2) * 30 + 20 pos_x = str (x) +'% 'pos_y = str (y) +'% 'pie = Pie (init_opts=opts.InitOpts ()) pie.add (c [[row.cup, row.num] for I, row in t _ data [t _ data.circumference==c] .iterrows ()], center= [pos _ x, pos_y], radius= [70,100], label_opts=opts.LabelOpts (formatter=' {b}: {d}%'), pie.set_global_opts (title_opts=opts.TitleOpts (title= "lower chest circumference = {}" .format (c)) Pos_top=str (ymur1) +'%', pos_left=str (xMel 4) +'%', title_textstyle_opts=opts.TextStyleOpts (font_size=15)), legend_opts=opts.LegendOpts (is_show=True)) grid.add (pie,grid_opts=opts.GridOpts (pos_left='20%')) grid.render_notebook ()
Out [7]:
Comment word cloud
Finally, let's take a look at what words are often mentioned in the comments.
In [8]:
W_all = [] for item in data.comment: Warrel = jieba.lcut (item) w_all.extend (Warrel) c = Counter (w_all)
In [10]:
Gen_stylecloud ('.join (w_all), size=1000, # max_words=1000, font_path='/home/kesci/work/font/simhei.ttf', # palette='palettable.tableau.TableauMedium_10', icon_name='fas fa-heartbeat', output_name='comment.png', custom_stopwords= [' none' 'user', 'fill in', 'comment']) Image (filename='comment.png')
Out [10]:
This is the end of the content of "how to generate cloud words with Pyecharts". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 208
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.