The implementation of the case of getting started with Python crawler crawling second-hand housing data 04/19 Update SLTechnology News&Howtos

The implementation of the case of getting started with Python crawler crawling second-hand housing data

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "Python crawler entry case crawling second-hand housing data". In the operation of the actual case, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

The focus of this article

Systematic analysis of the nature of web pages

Structured data parsing

Csv data saving

Environment introduction

Python 3.8

Pycharm Professional Edition > > Activation Code

# Module usage

Requests > > pip install requests

Parsel > > pip install parsel

Csv

[full version of paid VIP] A course that you can learn as long as you read it, 80 episodes of Python basic introductory video teaching

Click here to watch it online for free.

Crawler code implementation steps: send request > get data > parse data > Save data

Import module import requests # data request module third party module pip install requestsimport parsel # data parsing module import reimport csv send request For the house source list page to send the request url = 'https://bj.lianjia.com/ershoufang/pg1/'# need to carry the request header: disguise the python code as the browser to send the request # User-Agent browser basic information headers = {' User-Agent': 'Mozilla/5.0 (Windows NT 10.0) Win64 X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.61 Safari/537.36'} response = requests.get (url=url, headers=headers) get data print (response.text) parse data selector_1 = parsel.Selector (response.text) # convert the content of acquired response.text data into selector object href = selector_1.css ('div.leftContent li div.title a::attr (href)'). Getall () for link in href: html_data = requests.get (url=link Headers=headers). Text selector = parsel.Selector (html_data) # css selector syntax # try: title = selector.css ('.title h _ 2 html_data _ text'). Get () # title area = selector.css ('.areaName .info a:nth-child (1):: text'). Get () # area community_name = selector.css (' .communityName .info:: text'). Get () # Cell Room = selector.css ('.room .mainInfo:: text'). Get () # Huxing room_type = selector.css (' .type .mainInfo:: text'). Get () # toward height = selector.css ('.room .subInfo:: text'). Get (). Split (' /') [- 1] # floor # Middle floor / total 5 floors split ('/') for string segmentation ['Middle floor' 'total 5 floors'] [- 1] # ['middle floor', 'total 5 floors'] [- 1] the value of the index location of the list takes the last element in the list as 5 layers # re.findall ('total (\ d +) layers') > [5] [0] > 5 height = re.findall ('total (\ d +) layers') Height) [0] sub_info = selector.css ('.type .subInfo:: text'). Get (). Split (' /') [- 1] # Decoration Elevator = selector.css ('.content li:nth-child (12):: text'). Get () # Elevator # if Elevator =' no data elevator'or Elevator = None: # Elevator ='no elevator 'house_area = selector.css (' .content li:nth-child (3):: text') .get () .replace ('content' '') # area price = selector.css ('. Price. Total:: text'). Get () # Price (ten thousand yuan) date = selector.css ('.area .subInfo:: text'). Get (). Replace (' Annual Construction',') # year dit = {'title': title, 'Urban': area, 'Residential area': community_name, 'Household': room 'facing': room_type, 'floor': height, 'Decoration': sub_info, 'Elevator': Elevator, 'area': house_area, 'Price (ten thousand yuan): price,' year': date,} csv_writer.writerow (dit) print (title, area, community_name, room, room_type, height, sub_info, Elevator House_area, price, date, sep=' |') save data f = open ('second-hand housing data .csv', mode='a', encoding='utf-8', newline='') csv_writer = csv.DictWriter (f, fieldnames= ['title', 'urban area', 'residential area', 'Huxing', 'orientation', 'floor', 'decoration condition', 'elevator', 'area (floor)' 'Price (ten thousand yuan)', 'year',]) csv_writer.writeheader ()

Data visualization import module import pandas as pdfrom pyecharts.charts import Mapfrom pyecharts.charts import Barfrom pyecharts.charts import Linefrom pyecharts.charts import Gridfrom pyecharts.charts import Piefrom pyecharts.charts import Scatterfrom pyecharts import options as opts reads data df = pd.read_csv ('Lianjia .csv', encoding = 'utf-8') df.head ()

The number of second-hand houses in each urban area new = [x + 'district' for x in region] m = (Map () .add ('), [list (z) for z in zip (new, count)], 'Beijing'). Set _ global_opts (title_opts=opts.TitleOpts (distribution of second-hand houses in title=' Beijing), visualmap_opts=opts.VisualMapOpts (max_=3000) )) m.render_notebook ()

Number of second-hand houses in each urban area-average price bar chart df_price.values.tolist () price = [round (XMagne2) for x in df_price.values.tolist ()] bar = (Bar () .add _ xaxis (region) .add _ yaxis ('quantity', count, label_opts=opts.LabelOpts (is_show=True)). Extend _ axis (name= "price (ten thousand yuan)" Type_= "value", min_=200, max_=900, interval=100, axislabel_opts=opts.LabelOpts (formatter= "{value}"),) .set _ global_opts (title_opts=opts.TitleOpts (number of second-hand houses in each urban area-average price histogram'), tooltip_opts=opts.TooltipOpts (is_show=True Trigger= "axis", axis_pointer_type= "cross"), xaxis_opts=opts.AxisOpts (type_= "category", axispointer_opts=opts.AxisPointerOpts (is_show=True, type_= "shadow"), yaxis_opts=opts.AxisOpts (quantity of name='', axistick_opts=opts.AxisTickOpts (is_show=True), splitline_opts=opts.SplitLineOpts (is_show=False)) )) line2 = (Line () .add _ xaxis (xaxis_data=region) .add _ yaxis (series_name= "price", yaxis_index=1, y_axis=price, label_opts=opts.LabelOpts (is_show=True), zeal10)) bar.overlap (line2) grid = Grid () grid.add (bar, opts.GridOpts (pos_left= "5", pos_right= "20") Is_control_axis_index=True) grid.render_notebook ()

Area0 = top_price ['community'] .values.tolist () count = top_price ['price (ten thousand yuan)] .values.tolist () bar = (Bar () .add _ xaxis (area0) .add _ yaxis (' quantity', count,category_gap ='50%') .set _ global_opts (yaxis_opts=opts.AxisOpts (name=' price (ten thousand yuan)), xaxis_opts=opts.AxisOpts (quantity of name='') )) bar.render_notebook ()

Scatter plot s = (Scatter () .add _ xaxis (df ['area (area)'] .values.tolist ()) .add _ yaxis (', df ['price (ten thousand yuan)'] .values.tolist ()) .set _ global_opts (xaxis_opts=opts.AxisOpts (type_='value') s.render_notebook ()

Housing orientation ratio directions = df_direction.index.tolist () count = df_direction.values.tolist () C1 = (Pie (init_opts=opts.InitOpts (width='800px', height='600px',)) .add (', [list (z) for z in zip (directions, count)], radius= ['20%,'60%'], center= ['40%' '50%'], # rosetype= "radius", label_opts=opts.LabelOpts (is_show=True),) .set _ global_opts (title_opts=opts.TitleOpts (title=' housing orientation ratio', pos_left='33%',pos_top= "5%"), legend_opts=opts.LegendOpts (type_= "scroll", pos_left= "80%", pos_top= "25%") Orient= "vertical") .set _ series_opts (label_opts=opts.LabelOpts (formatter=' {b}: {c} ({d}%)'), position= "outside")) c1.render_notebook ()

Decoration / roses with or without elevators (combination) fitment = df_fitment.index.tolist () count1 = df_fitment.values.tolist () directions = df_direction.index.tolist () count2 = df_direction.values.tolist () bar = (Bar () .add _ xaxis (fitment) .add _ yaxis ('', count1) Category_gap ='50%') .reversal _ axis () .set _ series_opts (label_opts=opts.LabelOpts (position='right')) .set _ global_opts (xaxis_opts=opts.AxisOpts (number of name=''), title_opts=opts.TitleOpts (title=' decoration / roses with or without elevators (combination)', pos_left='33%',pos_top= "5%"), legend_opts=opts.LegendOpts (type_= "scroll") Pos_left= "90%", pos_top= "58%", orient= "vertical")) c2 = (Pie (init_opts=opts.InitOpts (width='800px', height='600px',)) .add (', [list (z) for z in zip (directions, count2)], radius= ['10%,'30%'], center= ['75%' '65%'], rosetype= "radius", label_opts=opts.LabelOpts (is_show=True),) .set _ global_opts (title_opts=opts.TitleOpts (title=' with or without elevators', pos_left='33%',pos_top= "5%"), legend_opts=opts.LegendOpts (type_= "scroll", pos_left= "90%", pos_top= "15%" Orient= "vertical") .set _ series_opts (label_opts=opts.LabelOpts (formatter=' {b}: {c}\ n ({d}%)'), position= "outside") bar.overlap (c2) bar.render_notebook ()

Floor = df_floor.index.tolist () count = df_floor.values.tolist () bar = (Bar () .add _ xaxis (floor) .add _ yaxis ('quantity', count). Set _ global_opts (title_opts=opts.TitleOpts (title=' floor distribution columnar zoom chart), yaxis_opts=opts.AxisOpts (number of name='') Xaxis_opts=opts.AxisOpts (name=' floor'), datazoom_opts=opts.DataZoomOpts (type_='slider')) bar.render_notebook ()

Vertical histogram of housing area distribution area = df_area.index.tolist () count = df_area.values.tolist () bar = (Bar () .add _ xaxis (area) .add _ yaxis ('quantity', count) .reversal _ axis () .set _ series_opts (label_opts=opts.LabelOpts (position= "right")) .set _ global_opts (title_opts=opts.TitleOpts (title=' vertical histogram of housing area distribution')) Yaxis_opts=opts.AxisOpts (name=' area (number)'), xaxis_opts=opts.AxisOpts (number of name=''),) bar.render_notebook ()

"Python crawler entry case crawling second-hand housing data" content is introduced here, thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.